Ember is a top-down real-time 3D streaming open world RPG. You often see those words strung together to describe modern RPGs, but there was one other design goal we set for ourselves that is less common in that genre: mobile. There are plenty of RPGs on mobile, there are plenty of great looking real-time 3D games on mobile, but there are very few that incorporate both of those concepts with a streaming open world. The lure of exploring a beautiful fantasy world with no load screens or restrictive map borders on a mobile device was something we felt was unique and well worth the development effort.
We started with the Ogre engine. We knew we would be pushing the limits of the hardware, and that meant we needed a very “light” and easily modifiable engine that would allow us to craft a custom game engine on top of it. Ogre got us up and running quickly on iOS and PC by handling initialization and providing us with file exporters for various 3D modeling tools. With Ogre handling the basics, we were free to begin building the engine and editor.
Mobile Most Wanted: Draw Calls and Fill Rate
Many mobile hardware manufacturers tout “console-quality graphics”, and they aren’t lying. The feature sets of mobile graphics chips are comparable to modern consoles. Complex shaders, high resolution textures, multiple render targets, occlusion culling, everything needed to make a visually rich scene. Where they don’t stack up to consoles is in raw power: the fill rate, triangle throughput and draw call capabilities of mobile hardware are often an order of magnitude lower than even last generation consoles. So while you could implement a deferred renderer and advanced post-processing effects with all the bells and whistles that you’d expect on a console, it would not run at an acceptable framerate at even sub-HD resolutions.
Early during the preproduction of Ember the art team set a graphical goal that included complex lighting (including a day-night cycle!), feature-rich materials, and an insanely high level of visual density - they wanted tons of “stuff” in view to make the world feel more real. We also set a goal of having the many objects in the scene be fully interactive. Between the fill rate requirements of the lighting and the draw call requirements of having so many dynamic objects visible, we had our work cut out for us.
A quick note about triangle count limitations - we kept the visible triangle count in the 100,000 to 200,000 range by having the artists adhere to guidelines while modeling. These numbers were based on information provided by hardware manufacturers and we never bumped into any issues by staying within those limits.
The Call to Draw
Whenever the game engine sends the graphics hardware a bunch of triangles and a material to render them with we call it a “draw call”. Some engines refer to these as “batches”. You generally need one draw call for each different material or object on the screen. Sending a draw call to the graphics driver costs a certain amount of CPU time, usually because the driver has to do a bunch of internal state switching, copying things around in memory, etc. Too many draw calls and all of your CPU time will be spent inside the graphics driver, leaving little time for gameplay code. So how do you keep the number of draw calls low but still have a visually complex scene?
Our solution was a simple one: fill as much of the screen as possible with as few calls as possible. The world is built out of 20 meter by 20 meter square tiles that the world builders place on a grid in our editor, similar to a standard tile-based game - except the tiles are huge and fully 3D. Each tile uses at most 4 draw calls thanks to artist-created texture atlases, and because of our top-down camera we know you can only see 4 tiles at a time. This means we completely fill the view with only 16 draw calls. This leaves the remainder of our draw call budget for dynamic objects and UI, which is plenty as long as items and characters are one or two draw calls each. And because these tiles are fairly large and full 3d models, the artists can put as much or as little detail into them as they feel necessary as long as they adhere to the “less than 4 draw calls” rule. Our tiles represent everything from two triangle quads for “open ground” to 30,000 triangle multi-story houses.
This pipeline ended up being simple and efficient: the artists could focus on making the tiles look as good as possible and the world builders could build the world without worrying about technical limitations. The only guideline we gave to the builders was to try to re-use tiles often to help reduce memory usage and strain on the asset streaming.
Day-Night Cycles and the Shaders That Love Them
After we figured out how to achieve the visual complexity we desired the next step was making all those triangles look as nice as possible with fancy lighting and materials. Mobile hardware has an annoying quirk when it comes to fill rate: tablets and phones have insanely high resolution displays coupled with disappointingly low fill rates. Fill rate is the speed at which the graphics hardware can draw a single pixel. So consoles can fill billions of pixels per millisecond, while mobile hardware may only be able to fill millions (numbers made up for this example, not actual values!).
So how do you make a low fill rate device render to higher-than-HD resolutions at an acceptable framerate? You go old school. We’ve found that techniques that were regularly employed in the early 2000s work extremely well on mobile. Lightmaps for static objects, vertex lighting on dynamic objects, hand placed flare images to simulate bloom, it’s fun leveraging old graphics techniques again.
The first thing we had to figure out was lighting the huge tiles. Remember, our tiles were 20 meter by 20 meter slabs of geometry that were built by an artist then placed in the world by a level designer, with the goal of lots of re-use. We knew we wanted to lightmap the tiles, but the extensive re-use complicated things. If you made a forested area by repeating the same “forest” tile in a 5 by 5 grid, how do you lightmap that? Our first naïve thought was that each grid square in the entire world would have its own lightmap. So in the 5 by 5 grid of forest tiles, that would be 25 lightmaps. This quickly became unwieldy due to the number of lightmaps, we did some napkin math and with the projected size of our world and the minimum lightmap resolution we found acceptable we were looking at something like 5 gigs of compressed lightmap data. Considering at the time the app size limit for iOS was 2 gigs, that idea was out. The solution we ended up with was very art-centric: the artists would lightmap each tile individually in Maya, and ensure no shadow escaped the confines of the tile. The artists did an amazing job making trees, walls, buildings, and other tall objects cast long shadows that were entirely contained within the tile. Ambient occlusion was also included in the tile lightmaps.
So with the tile meshes now having high resolution lightmaps we had to figure out how to change them for time of day. Putting our brains back into 2002 mode, our first attempt was simply multiplying the lightmap by a “time of day” lighting value. This actually didn’t look that bad when a tile was only hit by the sun. Problems arose when the tile had placed lights in it, for instance a campfire in the middle of a clearing. The campfire was supposed to cast this wonderful orange-red glow, but at night that glow would get darker and just look… wrong. After some other failed attempts at trying to get the single lightmap to work by using shader trickery, we eventually settled on generating two lightmaps per tile. One contained the lighting contribution from placed lights and the other was a black and white lightmap containing the shadows cast by the sun. The tile shader would color the sun lightmap based on the time of day, which would then get added to the “placed light” lightmap, which would then get combined with the diffuse texture. This system used twice as much memory due to the second lightmap texture, but we found that very few tiles actually needed both textures anyway. Underground tiles didn’t need the sun lightmap, and only a few outdoor tiles needed the “placed light” lightmap. In the end this system was a huge win, allowing for convincing day-to-night transitions as well as modifying the sun’s contribution based on weather conditions or other special effects, like combat skills. Note that all of this work was done in a single pass in a relatively simple shader, meaning that even though the tiles covered the entire screen they used very little fill rate.
Other than tiles our scenes also contain characters and dynamic objects. Characters use a fairly complex shader that allows for bump mapping, specular, emissive, GPU skinning and three per-pixel lights in a single pass. Dynamic objects, such as food, weapons, etc., use a simple vertex lit shader. These objects are so small or simple that a more complex shader wasn’t necessary. One other thing to note about objects: larger dynamic objects such as tables and chairs had a shadow built into the mesh. The artists would simply put a quad with a “shadow blob” texture under the object, which made them feel much more grounded in the scene.
Another big fill rate hog is post processing effects, and as such Ember doesn’t have any post processing at all. We tested bloom and depth of field early in development but we felt that the frame rate hit wasn’t worth the visual effect. Instead, we simulated bloom with flare sprites positioned on top of every bright light. Our editor has a prefab system that allowed the flares to be placed automatically, making this a fairly simple process. As a matter of fact, it was so easy to put down a lot of flares that flare processing started showing up in early CPU profiles. They looked great, though, so we spent some engineering time on optimizing the flare system so we could have a large number visible without impacting frame rate.
Streaming the Streams… and Rivers, and Mountains
We felt that it was very important for the world of Ember to be one large unbroken landmass with no loading screens. Games that have discrete levels have a different feeling than ones where the world seems huge and unbound, and we wanted to evoke that sense of being a small part of a large world. The thinking was that if we were continually loading assets on a background thread that was running on a dedicated CPU core we should be able to achieve a seamless world without impacting the framerate.
The streaming algorithm is fairly simple: as the player moves around the world we load the tiles and objects immediately surrounding them and unload assets that are far away to keep memory usage down. This is done by keeping track of all loaded assets in a list and ensuring everything in that list uses less than a pre-set amount of total memory. If an asset would make the list use too much memory, then the oldest assets get unloaded to make room for the new ones. If an asset is in view and already loaded it is “refreshed” and moved to the top. This system ensures we can control how much memory is used fairly easily: enlarging the list’s memory limit makes the game smoother by caching more, lowering the limit lowers our overall memory usage. We also unload older assets from the list when we received a low memory warning from the OS to help us avoid memory crashes. The algorithm is fairly simple, and at its basic level works well, but we did run into a few issues.
Our initial worry was that larger assets, such as a triangle-heavy tile with 4 high resolution textures, would take so long to load that it wouldn’t be done loading before the player should be able to see it, causing huge square tile-sized holes in the ground, or items or characters suddenly popping into view. It turns out that our worries were unfounded: the speed of the CPU coupled with the fairly fast transfer rate of storage on mobile devices meant that it could load assets very quickly. What we didn’t expect was that the already over-taxed main thread running on the primary CPU core would be required to do so much processing of these assets once they were done loading in the background. Whether it be processing a mesh for collision in our own game code or the driver having to do some work when uploading textures or geometry, Ember often felt very “hitchy”. Whenever an asset was done loading it would cause a sudden spike on the main thread, which manifested in gameplay as a noticeable hitch. The reasons for these hitches were many and varied, and often popped up due to changes in the code or even the art. It was a constant battle to keep the game smooth. It would be running great for months then all of a sudden be a hitchy mess for unknown reasons. We’d have to profile, find out what asset was causing the main thread to do so much work, and find ways to either put that work in the background thread or simply optimize the code to reduce the CPU hit.
The Open World is Yours
We here at N-Fusion Interactive felt that thanks to the explosion of powerful mobile hardware now was the right time to finally make the open world RPG we have been dreaming of since our studio’s founding almost 20 years ago. It wasn’t easy, and it took a very long time, but we accomplished our goals even on today’s modest mobile hardware.