The high cost of drawing thousands of different objects, no matter how simple, is among the greatest problems of PC renderers today. The high cost of individual render calls is compounded by the high cost of render state changes between different objects. One of the worst offenders in this regard is the texture change. In a complex game scene, there might be thousands of objects on the screen, using hundreds of different textures – one or several for each distinct type of objects.
Texture atlases are large textures made up of many separate textures. Each object's texture uses only a portion of the atlas texture. The perfect atlases would be ones hand-created by artists, but this approach is very inflexible: it makes adding or removing new texture assets to the game much more expensive in terms of artist time than it is reasonable to be.
This article describes the workings of real-world system for automatically generating texture atlases, from the atlas creation tool to the engine integration issues. It has been successfully used in Haemimont Games' Glory of the Roman Empire (working title), a strategy/simulation title for the PC scheduled to ship in the first half of 2006.
Theory and Benefits of Texture Atlases
For the basic theory behind texture atlases, the reader is referred to an article called “Improved Batching via Texture Atlases”, available from NVIDIA's developer site. In a nutshell, all the textures needed by the game are combined in several large atlas textures. A coordinate remapping table is built and loaded in the engine, and it is used to scale and offset texture coordinates for each object to select the appropriate subregion into the atlas.
|A texture atlas with 100 unit, vegetation and decoration textures|
Texture atlases greatly reduce the importance of different object textures as a factor in batching and sorting the scene. Typical packing ratios vary between 16-256 textures in an atlas; this means you can have 16-256 times less texture state changes per frame.
If your renderer implements any kind of static batching of world geometry into pre-computed vertex buffers (grass and small vegetation being the most common example), texture atlases allow you to batch together objects of different types, under the (realistic) constraint that all object textures happen to be allocated in the same atlas.
Having fewer physical textures is also greatly beneficial for sorting – if you sort by texture atlas, not by original texture, you will get much larger spans of objects using identical atlases in your render queue, which allows you to get your sorting order closer to the optimal front-to-back sorting.
Finally, the runtime cost of atlases is negligible in most scenarios: a 2D multiply-add operation per texture coordinate set in the vertex shader (which is rarely the bottleneck in real-world applications) and a float4 vertex shader constant register. In addition, there might be a small percentage of wasted texture space due to the packing – for example, if the textures don't completely fill the last atlas – which can be minimized by careful tweaking of the texture resolutions.
Atlas Creation Tool
The texture atlas tool in our art pipeline uses NVIDIA's atlas creation tool and texture compression tool (NVDXT) to perform all of the actual packing and compression of textures. NVDXT is the gold standard of texture compression, and the NVIDIA atlas creation tool, while not perfect (e.g. it doesn't handle correctly non-square textures), is quite useable. Replacing one or both of them with different tools using a similar interface would be easy.
Our tool takes as its input a directory tree of unpacked art assets and a configuration file (see Listing 1). It produces an atlas set: a number of compressed DDS textures and a remapping table binding original texture filenames to atlas filenames plus remapping coordinates.
The tool uses wildcards (regular expressions, actually) and NVDXT parameters as a general mechanism to sort images and specify how each image is to be processed. Each input texture filename is matched against several groups of regular expressions, which have associated NVDXT parameter sets. Within the group, the first matching regular expression for the filename is taken and its corresponding parameters are added to the NVDXT command line for this file; no more matches are attempted for this group, and the process continues with the next group of expressions.
; use Kaiser filtering for all files
CommonNvdxtOptions = -Kaiser –RescaleKaiser
OutputSet = HighTextureDetail
; special-case large buildings
; special-case small units
; default size for any other textures
|Listing 1 - A sample configuration file for the atlas creation tool|
One group is used to specify texture sizes by the filename prefix (e.g. /Buildings/* is resized to 512x512, but as an exception /Buildings/Coliseum/* is resized to 1024x1024), and another to specify pixel formats (e.g. *.diffuse.tga gets compressed to DXT5, and *.lightmap.tga gets compressed to L8, where the “diffuse” and “lightmap” suffixes, along with several others, are set by the art package exporter). An additional group of regular expression masks is used to assign a “family ID” to each texture, which is a way to force grouping of textures into the same atlas for specific purposes – e.g. all particle textures to be rendered separately via the particle system renderer, or all textures of vegetation objects to be used in building static vegetation batches.
After the desired compression settings for each input texture - size, pixel format and others, expressed as NVDXT command-line parameters – are determined by the described procedure, a texture compression cache is looked up using an MD5 hash of the file contents and the compression settings as a key. The cache is preserved across builds. It has two purposes: first, reusing the same file from a previous build avoids the expensive compression procedure.
Second, finding the same texture elsewhere in the current build, compressed exactly with the same settings, results in writing only an alias to its previous occurrence in the atlas remapping table, sparing precious texture memory in the relatively frequent case of artists using the same texture for several objects. Finally, if the cache lookup fails, NVDXT is invoked and its result is stored in the cache for future use.
After all input textures are in the desired compressed and resized form, they are split into groups by pixel format and family. Each group of files with the same pixel format and family is fed to NVIDIA's AtlasCreationTool. Its output is parsed and integrated into a single remapping table for all pixel formats and families.
During the build process, the atlas creation tool is run three times over the same folder tree with original art assets, but with different configuration files, to produce different atlas set for the three texture quality levels used by the game depending on the hardware capabilities and the settings chosen by the user. The configuration files differ by destination texture sizes and pixel formats. The exact texture sizes are produced with the help of a huge Excel table listing every object in the game, the world-space size of its mesh, and texture dimensions and pixel formats for low/medium/high detail. The table estimates the total required texture memory for each detail level; our target was 64 MB cards for low detail textures, correspondingly 128 for medium and 256 for high.
Thanks to the compression cache, the typical atlas creation portion of the build procedure takes on the order of 2-3 minutes, although rebuilding this cache from scratch can take considerably more.
Integrating the texture atlases in the renderer is a relatively isolated change, affecting the code in three places: the texture manager, the shaders and the render queue.
First, the texture manager used for loading textures by filename receives the additional responsibility of loading the atlas set with all its atlases and the remapping table. The internal representation of a loaded texture, which in the absence of atlases is just a device-specific texture pointer, becomes a structure with a texture pointer to the atlas holding the actual texture plus two pairs of texture remapping coordinates: one for scale and one for offset. For textures which are not in atlases, this is reduced to an identity remapping, zero offset and unit scaling, to allow them to be used with the same shaders. Whenever a “logical” texture has to be loaded into a texture sampler, the atlas-enabled texture manager actually loads the “physical” texture (the atlas) into the sampler, and these remapping coordinates into a reserved vertex shader constant register associated with the sampler.
The second modification is in the vertex shaders. For each set of texture coordinates a simple remapping function should be called to scale and offset them with the remapping constants for the appropriate sampler.
With these two changes the introduction of atlases remains completely transparent to the rest of the system. The basic operations with textures remain the same as without atlases: load texture by filename, and assign loaded texture to sampler. The actual benefits of texture atlases are reaped in the third integration point, where this transparency must be broken.
The render queue – that part of the renderer which has the responsibility of gathering all render objects for the current frame and to sort them, attempting to achieve both a minimal number of render state changes and a good front-to-back order – needs to query the texture manager for some form of “physical” atlas ID of each “logical” texture, and sort by these IDs. In the comparison function used for sorting, the atlas ID comparison should come after object material (as shader changes are more expensive than texture changes) and before distance to camera (so that groups of objects with the same shader and atlas are drawn front to back).
In practice, the atlas-enabled texture manager has two additional features. One of them is support for the different texture detail levels described in the previous section, which becomes simply a matter of loading a new atlas set. The other feature is support for textures not packed in the atlases, which is useful during development. On each load request, the texture manager first checks to see if the requested filename exists on disk; if it is found, it is loaded directly and the atlas set is ignored. This enables artists to quickly preview new versions of specific textures, without either going through the relatively slow atlas build procedure or working with a slower build without atlases.
Downsides of Texture Atlases
Texture atlases are not a panacea for all of the numerous problems around textures and their management, and they come with their own set of problems. For example, texture tiling is impossible, and all texture coordinates should fall within the unit square (0,0) – (1,1); this requires some retraining for artists who are used to the “free” wrapping of textures, but ultimately is not a problem for texturing meshes.
Another potential issue mentioned in the original NVIDIA whitepaper on atlases, the color bleeding between adjacent textures in the atlas at high mipmaps levels, was never observed by us in practice, but could probably happen with another set of art assets. It could be reduced by a more complex procedure of ordering textures within the atlas, using the average color of each texture to group together similar textures.
The greatest problem with the described static texture atlases is that they don't easily combine with any kind of asset streaming, or on-demand loading with fine granularity. For our particular project we had the requirement of fitting all the assets in memory all the time, because it is relatively common to have virtually all the art on-screen, making atlases a natural fit. If you need streaming, however, atlases should be built with that in mind, grouping together textures that will likely be loaded together, e.g. a farm house with all its possible yard animals, or a number of similar vegetation objects. The optimal batches could be calculated by a tool examining the structure of the levels on each build. The benefits of atlases in this case should be carefully weighted against the overhead of potentially loading unneeded texels.