The idea behind this dependency-based strategy is quite simple. A dependency represents a link between a source asset and processed output file, indicating that the latter contains data that is provided (or affected by) the former. So, we say that the output file depends on the source asset, and that the asset is a prerequisite of the output file. This is a many-to-many relationship; one output file may depend on many source assets (consider, for example, a model file containing a mesh, textures, and animation data), and one source asset may generate many output files.
Dependencies are not limited to being simple links between pairs of files, either; if some files are built using intermediate files, or depend on other output files, then a dependency chain emerges, where each of the dependencies of a file may in turn have dependencies of their own. Figure 1 shows a simple dependency chain for a character model. If the entire asset pipeline was viewed, elements of this chain (for example, the run animation) might be used in other characters as well, and therefore have additional dependent resources.
Walking along the dependency chain for an output file, therefore, provides a list of all of the source (and intermediate) files that affect it, and hence may cause it to be rebuilt if they change. However, while this is a useful view conceptually, in practical terms it is usually more useful to look at dependency chains the other way around: for a given source asset, walking along the chain for its dependents will give a list of output files that must be updated if it is changed.
|Figure 1: A simple asset dependency chain.|
Dependency chains do not generally exist in isolation, either; chains frequently meet and overlap (for example, if one intermediate file or asset is used by many processes). This is actually another very useful property because in doing so, they provide all the information needed to minimize the amount of effort required to perform a single set of updates.
Consider the case where a number of source assets have all been changed. If each change is processed independently, and the individual dependents of the source asset updated, then some output and intermediate files may be updated several times. This is particularly problematic in the case where there are several "layers" of intermediate files depending on one another; in these cases, it is hard to remove the unnecessary updates because only the last update of any given asset is guaranteed to have a complete set of up-to-date intermediate files!
Figure 2 shows an example of this type of more complex dependency chain. The knight and paladin models share the same run animation, but have different base meshes. However, they both use the same texture page and therefore, the intermediate texture page file is shared between them.
|Figure 2: Dependencies on shared assets.|
The dependency chains contain the solution to this, as they store all of the necessary information about the relationships between the files to ensure that every file (both intermediate and output) is updated once only, but in the correct order to ensure that old data is never used. This is done by walking through all of the dependency chains simultaneously, and building a queue of the files that must be processed.
One very straightforward way to do this is to exploit the fact that the dependency chains themselves encode the order that operations must be performed in. To build the queue of operations is a simple iterative process, using a list of potentially modified files as the basis.
The first step of the procedure is to take every source file that has changed, and recursively walk down to all its dependents, adding each to the list (if it is not already present). After this step, the complete set of files that must be updated is stored on the list and the processing order can be determined.
This is done by repeatedly walking through the list and checking each file to see if it is ready to be processed. This is done by examining the files it immediately depends on; that is, those prerequisites that are directly linked to it. If any of those files is still on the list, then it cannot be processed yet, and is skipped. However, if none is present, then the file is moved to the end of the queue. This process is then repeated until there are no files left on the list. With this done, the queue contains an ordered list of the files for processing, so that every file is only updated once, and all of the prerequisite files are updated before each.
While in the majority of cases the files will be processed in a linear manner, and therefore this queue is all that is required for the operation to begin, it is also possible to produce output in a form suitable for processing many assets in parallel, for example, using a distributed network of machines, or a multi-CPU system. To do this, the same procedure is used, but with a marker added to the items on the list. When an item with no prerequisites is found, instead of being moved to the queue immediately it is marked and left in place. Then, when the end of the list is reached, all of the marked files are moved into the queue as a "batch." Each of these batches consist of files that are ready for processing but are also guaranteed to be independent of each other, so they can all be handled simultaneously if needed.
While in many cases analyzing the dependency information once and then processing the resulting queue of output files is enough; in cases where there are large numbers of changes being made to the source assets, it may be desirable to update the processing queue as changes are made. This can be done very simply by taking the current outstanding queue entries and adding the dependents of the newly-modified assets to them, creating a new list of files that need updating. Then, the dependency analysis procedure can be repeated using this input list to generate a new queue for continued processing, thereby ensuring that any changes caused by the new updates are correctly inserted into the processing order.
This technique can be very useful if the asset processing system allows multiple tasks to run concurrently, as it means that a single processing operation does not block the entire system until it completes, though unrelated operations may still be executed in parallel with it.
Determining Asset Dependencies
One of the key problems faced when implementing a system of this nature is how to actually construct the dependency information for the assets in the first place. The mechanisms for doing this will depend to a large degree on the processing tools and files being used, but there are some general areas that most techniques fall into:
Explicitly Stored Dependency Information
In some systems, such as the make tool which will be described in more detail later, the dependency information is stored as part of the script that describes all the desired processing operations. In general, this file is human generated, although dependencies can be specified for groups or types of files as well as individual assets, reducing the amount of maintenance required. This approach has the advantage that it is very easy to see and edit the dependency information, especially if it is necessary to add some special case entries for certain assets.
However, there are several fairly significant disadvantages of this system. Dependencies must be consistent across fairly large groups of files, otherwise a lot of manual editing is required. It is also impossible to encode dependency information that is based on the contents of the assets. So, for example, making a model file dependent on the textures it uses is impossible unless a human (or another tool) updates the dependency information by hand.
Dependency Information Stored in Assets
Another approach is to store the dependencies of asset files in the file itself. This way, the dependency information can be built by the exporter or tool that creates the file, based on the information it has about the contents. This makes this approach very suitable for handling assets such as models which may be formed from several separate files. It is also generally quite straightforward to implement, although a unified format for storing this information (either as part of the asset file, or in a separate metadata file) is required.
The main disadvantage of this approach is that it is only suitable in circumstances where the dependency chain for an asset can be easily predicted ahead of the processing itself, and is not likely to change often. This is because the information is generally needed to form dependencies for files other than the one it is actually stored in. For example, storing a list of textures used in a model file does not actually define prerequisites for the model file itself, as it is a source asset and has none. Instead, this information is used to construct the prerequisites for the processed file(s) created from this model.
Dependency Information Generated by the Processing Code
The final approach is to generate the dependency information "on the fly" by using the processing code (or a subset of it) to read each asset and build the dependency tree. This approach has the major advantage that it can easily handle very complex interdependencies between assets based on their contents, and it is relatively easy to maintain, once the initial framework is in place. Also, by building the dependency information this way, changes in the structure can be easily implemented, without having to edit external files, re-export, or reprocess assets to update their stored data.
However, the process of building the dependency information can be quite slow, and must be repeated whenever an asset changes. It also means that the dependency information is not easily visible for debugging purposes, or editable in the event that a special case change is required.
Of course, there is no requirement that only one of these approaches is taken; it is not uncommon to use a combination, picking the most appropriate technique for different types of assets or processing requirements. Dependency information from a number of different sources can be easily integrated into a single dependency tree for processing, and it is even relatively straightforward to remove all of the dependency information for a given asset or assets and re-insert it if changes to the asset that affect its dependencies occur during processing.
Determing When Assets Have Changed
The procedure for actually determining when an asset has been modified depends largely on the structure of the asset management system in use. If a version control system of some description is employed, then it is simply a case of either comparing the version numbers of each asset in the database with the last processed copy, or just retrieving the list of modifications in every changelist since the last update was performed.
On a flat file system, it is slightly more difficult to detect changes, although there are some methods that work relatively well. The most commonly used system is simply to compare the "last modified" date on each file, and check if it is newer than the last version that was processed (or newer than the processed output file, in some systems). This is not particularly robust, though, as it can be easily confused by actions such as "rolling back" files (by copying a previous version over the top), or if a machine's internal clock is wrong! It does have the major advantage of being very fast, and requiring little or no external information about the files.
Another, more stable method is to take a checksum of the files each time they are processed, and compare that against the stored copy. If a strong checksum or hashing system (the MD5 algorithm is a popular choice for this) is used, then the possibility of a collision, where two different files generate the same checksum value, occurring is infinitesimally small. Therefore, the check is a very robust way to determine if a file has changed. However, using this system requires that the entire source asset be read and the checksum calculated every time it needs to be checked, a fairly slow procedure.
If the file formats of the files being used are all under the control of the pipeline developer, or separate metadata storage is available, then one way to avoid this problem is to store the checksum in the file itself, thereby requiring only a handful of bytes to be read and compared to check for updates. However, it is comparatively rare that it is possible to do this for all types of asset files.
Another common compromise is to use both techniques, employing a simple timestamp based test for day-to-day updates, but performing a full checksum comparison on an overnight or weekend basis. This way, any assets that become "stale" as a result of an invalid modification date will be caught and fixed the next time a complete update is performed.
A less widely-employed, but useful in some circumstances, approach is to delegate the task of checking asset versions to the specific tools that perform the processing (sometimes after first checking the timestamp or checksum as an "early out" test). This allows the tool to perform much more fine-grained checking on the file, and determine which sections, if any need updating. For example, in the case of a game where levels are stored as a single large map file, it may be desirable for the map building tool to determine which sections have been modified and only update dependent files related to those, rather than the entire map.
The Make Tool
Probably the single most commonly used dependency-based build tool is make, a utility originally developed for Unix systems but later ported to just about every modern operating system. Make is available in every Unix distribution, and there are many Windows ports available, including direct ports of the Unix versions, and variants that are supplied with most compilers. Make's original purpose was to assist with compiling source code, but it is built in a very generic manner, allowing the invocation of virtually any command line tool as part of the process of converting input files into output files.
As such a generic tool, make is not able to use dependency information from the asset files themselves, and instead relies on an external file, known as a descriptor file that specifies both the dependencies between inputs and outputs, and the processing steps that should be performed on them. Make uses file timestamps to determine if a file is up-to-date, by comparing the last modifications of the source and destination files for each operation.
Descriptor File Syntax
Make's descriptor files are stored in a text file (typically called "makefile"), which is comprised of a series of rules, each of which defines the information needed to build a specific output file. The syntax is very simple: the name of the output file is supplied first, followed by a colon, and then a list of the prerequisites for that file. For example:
textures.bin : texture1.tga texture2.tga
specifies that the textures.bin output file depends on the two .TGA files listed. Therefore, if any of those files have a timestamp newer than that of textures.bin (or it simply does not exist), it will be rebuilt. The commands to build the file are specified immediately after the rule, preceded with a tab character to distinguish them.
textures.bin : texture1.tga texture2.tga
packtextures textures.bin texture1.tga texture2.tga
In this case, the command is simply executed "as is," and specifies directly the files to be operated on. However, make supports macros that can be used to allow rules to operate more easily on lists of files - for example:
TEXTURES = texture1.tga texture2.tga
textures.bin : $(TEXTURES)
packtextures textures.bin $(TEXTURES)
In this example, the list of input textures is defined as a macro, which is then subsequently referenced where it is needed rather than supplying the items explicitly. Macros are defined by supplying a macro name, followed by either "=" or ":=" and then the contents. If a variable is defined with "=", then it is a recursively expanded variable. Any reference to other variables will be kept intact in the macro and expanded every time it is used. If, on the other hand, ":=" is used, then it is a simply expanded variable. In this case, references to other variables will be expanded at the time the variable is defined, and the results stored instead. For example:
CHARACTERTEX = body.tga face.tga
LEVELTEX = grass.tga bluesky.tga
THISLEVELTEX = $(CHARACTERTEX) $(LEVELTEX)
LEVEL1TEX := $(CHARACTERTEX) $(LEVELTEX)
LEVELTEX = earth.tga redsky.tga
At this point, LEVEL1TEX will contain "body.tga face.tga grass.tga bluesky.
tga," as it was expanded before the definition of LEVELTEX changed. However, if THISLEVELTEX is used instead, then it will be expanded using the current values
of CHARACTERTEX and LEVELTEX, yielding "body.tga face.tga earth.tga redsky.tga" instead.
As seen in the previous examples, to reference a macro, simply surround the list name in brackets and prefix it with a $ sign (in other words, "$("). There are also some built-in macros (as well as more defined from the host machines' environment, such as the path to installed compiler tools), and a class of macros known as "automatic variables." These are automatically set up every time a command is executed, and contain information such as the target filename and the list of modified dependencies. For a full list of these, see the make documentation.
Make also contains various functions that can be referenced in a similar way to variables, and that similarly insert their results into the rule. These can be used to perform many useful tasks such as string manipulation and wildcard expansion (again, a full list can be found in the make documentation).
The rules in the definition file describe how to actually build the files referenced, but they will not actually cause anything to happen unless make has a reason to build the file. This will only occur if it is either explicitly asked to (by the user typing "make texture.bin," for example), or the file appears as a prerequisite in another rule that it needs to build (which, in turn, must have either been explicitly specified or invoked from a third rule).
In order to provide a convenient way to specify "top-level" rules that build a number of files, make supports phony targets. A phony target is a file that does not actually exist (and will never be created), but is always considered to be out-of-date. This can be used to write a rule solely for the purpose of triggering other rules, for example:
.PHONY : alltextures
alltextures : textures.bin textures2.bin
The ".PHONY" declaration defines that the target alltextures should be considered phony; in fact, even without this the rule would still operate normally. However, if for some reason a file called alltextures happened to exist on the disc, and it was newer than the source files (textures.bin and texture2.bin), then the rule would be considered to be up-to-date and skipped. Marking it as phony simply ensure that this can never happen.
In this case, the phony rule tells make that when it is asked to build alltextures, it should build the textures.bin and textures2.bin targets (because they are specified as dependencies). This rule can then be invoked by issuing the command "make alltextures" from the command line, or as a dependency of another rule, for example, a rule that makes all of the resources for the game. In addition, asking make to build the special target "all" causes it to build all of the top-level targets in the file (that is, all targets that are not prerequisites of another target).
As make was originally designed for processing and compiling source code, the early versions of the tool required every input file to be explicitly specified somewhere in the input rules. This is generally fine for programs, as the number of source files is usually relatively small, and additions are infrequent. However, this is not generally the case with game assets, and therefore maintaining a file that must contain the name of every single asset in the game soon becomes very unwieldy.
Fortunately, later version of make introduced a feature known as pattern rules. Pattern rules are a form of implicit rule (that is, a rule that operates on an entire class of targets, rather than an explicitly specified list) that allow a rule to be defined that is executed on every target whose name matches a specified string pattern. This way, rules that operate on specific types of assets can be easily built. Pattern rules follow exactly the same syntax as normal rules, except that the % character is used to indicate "one or more arbitrary characters" in the names specified. For example:
%.tex : %.tga
converttexture [email protected] $<
This rule enables any target with a .tex extension to be built from a corresponding .tga file. The [email protected] and $< entries are automatic variables that correspond to the name of the target file and the source file for the rule, respectively. For example, if the target file grass.tex exists, then this command would expand to "converttexture grass.tga grass.tex."
It should be noted that, like all other make rules, this does not actually perform any actions unless another rule references a file matching it.
Therefore, what is needed as the logical companion to pattern rules is some mechanism for specifying groups of files as the prerequisites of a target without actually listing them. This can be very easily achieved through the use of wildcards, for example:
alltextures : *.tex
This rule causes all of the files with a .tex extension in the current directory to be built (using whatever rules are available to do so, such as the pattern rule given above) when the alltextures target is referenced. However, this rule will only update existing files so that if a corresponding .tex file for the asset does not exist, it will not be built. Note also that while wildcards will be automatically expanded if they appear in a target or dependency list, in a variable declaration they must be explicitly expanded by wrapping them in the built-in wildcard function, "$(wildcard $.tex)," for example.
This is where the string manipulation features of make come in handy. Since what is actually required is not a list of the output files that exist, but a list of the output files that should exist, we can build that list by taking the list of input files and changing the extensions; we know that in this case, every .tga file should generate a corresponding .tex file. This can be done with the following rule:
TEXTURELIST := $(patsubst %.tga,%.tex,$(wildcard $.tga))
alltextures : $(TEXTURELIST)
This rule uses the patsubst function, which performs a pattern substitution. The first argument is the pattern to match (with, as before, % indicating any sequence of one or more characters), the second is the replacement pattern, and the third argument is the input data, in this case, taken from the list of .tga files generated by the wildcard function.
This pattern substitution has the effect of creating a list of the target files, by stripping the .tga extension and replacing it with .tex. The creation of this list is placed in a variable definition to improve performance. Since the variable is defined as being simply expanded, the wildcard and pattern substitution operators are only evaluated once, and then the resulting list is stored for re-use.
With this, it is possible to build make files that take source assets of different types from various locations, and build them as required without having to provide explicit rules for every single asset. However, there are often cases where it is desirable to be able to do just that, for example, when there is one texture that looks poor under the default compression settings, or if a special-case is needed to handle the player's character model differently from other NPCs.
Fortunately, make provides a very convenient mechanism for doing this. When searching for a rule to build a specific target, make will always use a rule that explicitly names that target if one is available, only examining implicit and pattern rules if none is found. Thus, even though a rule exists that specifies how to build .tga files into .tex files, if another rule is written with a target of player.tex, it will be used to build that file rather than the more general rule. If the same target is explicitly specified by two rules, make will generate an error.
Advantages and Limitations of Make
Make is a very powerful tool, and the description given here only covers a relatively small fraction of the available functionality. Make is very widely used, and has been tested on many large scale projects. There is even (albeit somewhat primitive) functionality included for running multiple tasks in parallel to improve performance on multi-CPU machines. The descriptor file syntax is somewhat arcane at first sight, but is quite readable and can be easily read and edited by both humans and other applications. In particular, it can be very useful to use external tools to generate portions of these, as a mechanism for encoding dependency information from asset files.
Make works very well on Unix systems, where just about any conceivable task can be achieved through shell scripts or other command line tools. However, on Windows systems, less basic functionality is available to command line programs. In practice, though, this is a relatively minor hurdle. Most of the important tools for asset processing can be command line driven (or must be written in-house), and the other "glue" utilities can be fairly simply replaced or rewritten.
The main disadvantages of using make are that it only checks for file modification through the file timestamps, and adding support for more complex dependencies (such as those based on asset contents) can be complex, and require many custom tools to build additional dependency information in a format make can understand. Also, make has no native support for integrating with asset management systems, it works strictly from a local filing system. Therefore, for most purposes some form of external program