Modern game engines read a lot of data during game start up in nearly all sub-modules. The general data format XML is about to establish itself as the de facto norm in this area. One of its strength is ease of readability. Many standard tools and exporters for this format exist that make the game developer's life much easier. When using this data format in large files, however, one finds that the associated costs are higher memory consumption, longer reading times and memory fragmentation. These are especially painful in console game development.
In this article we show that these disadvantages can be avoided, not by using the standard DOM (document object model), but by using the SAX (simple application interface for XML). Both are different approaches to parsing XML data. The latter one is less known among game developers and seems to be a lot more difficult to use at first. However, we will show that, if applied properly, the amount of work that needs to be done is about the same for both approaches; only the control flow is different.
XML [RFC 3076] is a generalization of the HTML format where elements and attributes can be individually defined by the user. Today, this data format has a diverse range of applications where heterogeneous data needs to be specified. As the data format is ASCII or Unicode, it is especially interesting for game developers. It can be generated automatically by a tool, but is still editable by hand for quick and simple changes.
A wide variety of tools and standards are meanwhile developed to help the handling of XML files. Several standards are created to specify the syntax and even the semantics of special XML formats. One of the older format specifications is DTD. It is a special extension to the XML format that specifies a special XML syntax. A more advanced approach is the one that SCHEMA [derVlist2002] has taken. SCHEMA is a syntax and semantics specification language for XML files. SCHEMA specifications are themselves written in XML. Its ability to specify semantics covers things like default values or constraints. There some XML Editors like XMLSpy [XMLSpy] out there that can process Schema specifications to assist the user in editing files. Most database programs provide the possibility to export their data in XML format. Another interesting XML technology is XSLT [W3CR]. XSLT is a grammar transformation language with which it is possible to transform data from one XML format into another. This is, of course, very interesting from a game development perspective as the developer often has to cope with format changes in the game data as the design of the game matures.
Seeing all those possibilities XML offers, we concentrate in the rest of the article on how to get the XML data into the game in the most simple and efficient manner. As a simple toy example, consider the following XML specification which represents a stripped down version of a car:
It will be used as an example in the following explanations of parsing options. For simplification, we do not use any text constructs like 10.0 in the presented example. Extending the presented approaches to accommodate for these constructs is straightforward.
For reading XML files, public domain parsers exist; They fall into two categories. The most common among game developers is the DOM (document object model). With the DOM, the parser reads the complete XML file and generates a hierarchical representation of it in memory that can be then analyzed by the application. The SAX (simple API for XML) reads in the XML file as a stream and generates events alongside that have to be processed by the user application. Both models will be analyzed and compared in more detail in the following sections.DOM model
The document object model is the most. This XML parser generates a tree structure of the document in main memory that can be processed by the application. Once processing is finished the tree is removed from memory. The above described model would have the following representation as a DOM tree in memory:
The parser provides methods to navigate within this structure for the application to extract the necessary data once the document has been read from the file into memory. The control flow of the system for that is as follows:
For the implementation of the DOM tree reader we use TinyXML [SFTXml]. It needs only a few lines to be initialized. For our case the code to initialize the reader, and to start the processing, looks like this:
TiXmlHandle carHeader = hDoc.FirstChildElement("CarDefinition");
With the “LoadFile” command, the complete tree is generated in memory. All further calls are commands to extract the data from the tree. This extraction is done by the objects that need the data. All XML elements can be asked for further information. This is done for the extraction of the car definition block here. The car declaration is as follows:
void Parse(TiXmlHandle handle);
The Parse method is responsible for extracting the data from the tree to fill in the member variables. The implementation is straightforward:
void Car::Parse(TiXmlHandle handle)
section = handle.FirstChildElement( "Acceleration" ).Element();
section->QueryFloatAttribute( "a" , &m_acceleration);
section = handle.FirstChildElement( "Mass" ).Element();
section->QueryFloatAttribute( "m" , &m_mass);
m_wheel.Parse(handle.FirstChildElement( "Wheel" ));
One can see the use of the construct Handle and pointer in this code. The handle is a NULL tolerant pointer and makes navigation in the DOM tree structure much easier. Whenever a structure that is queried for - like in the above example the acceleration element - does not exist, the query command will return a NULL pointer. Any successive operations on this pointer will result in an exception. The handle is tolerant in this case and further handles can be queried from the handle for the element that does not exist. Only in the final stage where the data is actually accessed does the NULL pointer become explicit. This way it is possible to restrict error checks for missing data to the places where concrete elements are removing the need to scatter checks all over the code. Error checks have been omitted in this small sample for readability but should, of course, be included in actual production code. The handle can wrap most types of pointers in the TinyXML implementation.
The car itself now hands over the pointer of the child element with the wheel information and passes control over to the wheel to extract its data from the DOM tree:
void Parse(TiXmlHandle handle);
void Wheel::Parse(TiXmlHandle handle)
section = handle.FirstChildElement("Radius").Element();
After this is done the complete information about the car is in memory, and the DOM tree can be destructed as it is no longer used.
We will go into details about advantages and disadvantages of this approach after we have discussed the alternative in the following section.
As mentioned earlier the SAX model stands for “Simple API for XML”. The intention of this model is to provide a lightweight API that is easily adjustable to specific situations; it also comes with a low demand of resources. The idea of a SAX parser is not to generate a DOM tree of the document in memory but rather use certain callback functions when the XML file is streamed and certain elements are encountered. The application has to react to these callbacks and store the data into the correct positions. When the reaction is done, control is returned to the parser.
The main difference between DOM tree and SAX model is that with SAX, the control is governed by the parser in this case that feeds the data into the system. In the DOM model, however, the objects could extract the data themselves, so the control flow is very different. The control flow for our above example looks like:
The SAX parser used for the demonstration here is eXpat [SFX]. This parser is available via SourceForge. eXpat requires three callback handlers to be registered. One handler is for the start of the element; This handler provides all the information about the attributes. The second handler does not provide any specific information but marks the end of an element. The third type of callback provides the text within an element. For simplicity we do not use this feature here, but its implementation is straightforward. eXpat does not read from a file directly, but rather from memory. It is not necessary to load the complete file into memory; it can be read in sections that are processed by the parser. During processing the mentioned C-style callbacks are called, and data is fed to the application. All callbacks can provide a user data entry that is an arbitrary pointer which can be specified up front.
This approach seems a bit irritating and less suited for game development at first sight, which is also probably the reason it is so rarely used. On one hand there is the internal game structure possibly consisting of a large hierarchy of objects that want to grab data from a file. On the other hand there are the C-style callbacks from the SAX parser that want to deliver data in a manner that is not controlled by the object hierarchy. Switching from an internal home brew data format to a DOM tree parser seems to be much easier on first glance than to switch to an SAX parser.
If one takes a closer look, however the difference is not that large. The idea to solve the problem and make the parsing process look the same again is split into two parts. The first idea is to use a visitor that processes the callbacks and feeds the data to the destination objects. This visitor is responsible for associating the information provided by the callbacks with certain objects in the object hierarchy of the game. The second idea refers to the processing of the data within an object. In the DOM tree model, the object was doing a few calls to the tree to extract the data; This is done in the parse function. In the proposed solution, the data input function is called several times by the mentioned visitor. Every call contains a description of the data and the value itself. In an “if” or a “switch” cascade, the description of the data is identified and the data is written to the correct destination. This process is further explained in the following paragraphs.
The first instance in the process that receives data is the visitor. As all the callbacks can be provided with an arbitrary pointer, we set up a pointer to the visitor to process the data. The callbacks themselves are C functions that call their corresponding method in the visitor. The main task of the visitor is to dispatch the information that comes from the parser callbacks to their destination objects. To accomplish this task, it contains a stack of recipients. Data is fed to the recipient on top of the stack. Whenever a new element is started, a new recipient is pushed onto the stack, and when an element has ended, the recipient is popped. This stack hierarchy within the visitor corresponds to the call stack hierarchy when the diverse parse functions are called to extract the data from the DOM tree. Whenever data is fed from the dispatching visitor to a recipient in the form of a start element, the recipient has the opportunity to return a pointer to a new recipient for upcoming data. In this case, the pointer to this internal object is pushed onto the stack. In the other case we simply push the same pointer to the already existing object onto the stack. When an element is ended, a recipient is popped from the stack and any further data is transferred to the original recipient. Visually the process looks like this:
In this example, a call is dispatched to a destination object that requests a new recipient to be pushed onto the stack. When the callback is processed, all subsequent callbacks will get to the new recipient until the end-of-element is encountered for the object that initialized the push command. Then a pop command is executed, and all callbacks are again routed to the old object.
The visitor that dispatches all the calls is the central idea of this approach. The declaration of a dispatcher class looks like this:
void ParseFile(const char* name, Receiver* startObject);
static void XMLCALL start(void *data, const char *el,
const char **attr);
static void XMLCALL end(void *data, const char *el);
The receiver is an abstract class of every class that can get information and be dispatched to. The main routine parses a file and gets a first receiver handed over. This object is the first to receive information. The two static methods are the two C methods that are registered with eXpat to mark the beginning and the end of an element. The declaration of the abstract receiver class looks like this:
virtual Receiver* ProcessData(const char *el, const char **attr)=0;
The ProcessData method gets the name of the element and an array of attribute name and value pairs. It returns the new receiver or a NULL pointer if subsequent ProcessData calls should go to that same object.
The implementation of the ParserDispatcher looks like this:
void XMLCALL ParseDispatcher::start(void *data, const char *el,
const char **attr)
ParseDispatcher* processor = ((ParseDispatcher*)data);
Receiver* candidate = processor->m_stackOfReceivers.top();
Receiver* followUp = candidate->ProcessData(el,attr);
void XMLCALL ParseDispatcher::end(void *data, const char *el)
void ParseDispatcher::ParseFile(const char* name, Receiver* startObject)
XML_Parser parser = XML_ParserCreate(NULL);
FILE* input = fopen(name, "r");
XML_SetElementHandler(parser, start, end);
size_t len = fread(buf, 1, sizeof(buf), input);
done = len < sizeof(buf);
XML_Parse(parser, buf, len, done);
The parse method generates the parser and registers itself via the arbitrary pointer in “XML_SetUserData”, that it can be referenced later on in the static methods. This way one can have several parsers running at the same time. The file is opened and the handlers are registered. The first receiving object is pushed onto the stack. All the while, loop data is read in chunks to limit storage requirements.
In the static start method, the stack is extracted via user data pointer and the call is dispatched. Depending on whether we get a return value or not, we push the new object or the same object onto the stack. If we would like to process text that appears between an element start and end, we would dispatch it to the same top object on the stack without modifying the stack itself. The end command pops the topmost pointer from the stack again, so that it returns to the old object.
This, of course, is again a minimalist implementation without any error processing or ease of use issues.
The second part is the declaration and implementation of wheel and car. Both have to be derived from the receiver class in order to work properly. The declarations of these classes look like this:
class Wheel : public Receiver
virtual Receiver* ProcessData(const char *el, const char **attr);
class Car : public Receiver
virtual Receiver* ProcessData(const char *el, const char **attr);
The declarations look similar to the ones in the DOM parsing model, except for the fact that they are now derived from Receiver and have the “ProcessData” method instead of “Parse”. Even if it looks similar, one always has to keep in mind that the control flow is different. The “ProcessData” method of one object can be called several times and one “ProcessData” of one object never calls the “ProcessData” of another object; but just returns a pointer to the new receiver.
The implementation of the car with the SAX parser now looks like this:
Receiver* Car::ProcessData(const char *el, const char **attr)
It is a string compare cascade to get the information for the different properties. In the case that the element name is “wheel”, we return a pointer to the wheel because the wheel is the next object that has to process the data. For all the other cases (“Acceleration”, “Mass”) we extract the necessary information and store it in the corresponding member variables. We also return a NULL pointer as we still want to give information to the same object if we did not encounter a wheel. For the wheel, the implementation is also straightforward:
Receiver* Wheel::ProcessData(const char *el, const char **attr)
One can see that those implementations are not significantly more complex than those of the DOM tree version. However, both have their advantages and disadvantages. The advantages of the SAX model are better than those of the DOM tree. In the following sections these advantages and disadvantages are discussed.Comparison
In the previous sections we have shown how to use a DOM tree parser and the generally less known SAX parser to read XML data into a game engine. Here we want to compare the two approaches to analyze which method is better suited to which situation.
The implementation that uses the SAX parser is slightly more complex as it needs the dispatcher that travels around the objects as a visitor to gather the information. On the other hand, it is not as complex as one might think after reading the specification of the SAX parser. The dispatcher has only to be implemented once for the whole project. The cascade of string compares in the SAX parser can the program less readable than the DOM version. Eventually, it would make sense to use some macros for the reading process to increase code readability.
One major problem that remains with the SAX parser is that the user who implements the “ProcessData” method has no control over the sequence in which different elements are processed. The processing sequence depends only on the sequence within the XML file. In the DOM tree parser, objects may pick data from arbitrary positions in the DOM tree in arbitrary order. In the context of gaming, this is especially important if the data of one element is necessary to reserve some memory that is used by the upcoming data. One has to make sure that the order within the XML file is correct when an SAX parser is used. This can be a problem when migrating from a DOM tree parser to an SAX parser.
The big advantages of the SAX approach become obvious when it comes to performance. The SAX model is superior when it comes to memory consumption and execution speed. The increase in execution speed stems from the fact that only one pass for getting the data into the engine is used. The effort that is needed to decompile the loaded DOM tree is omitted here as the data is fed directly into the program. The cascade of string compares looks inefficient at first sight. Here, one has to keep in mind that the methods used to get elements from a DOM tree use a string compare internally. Performance-wise the main advantage of the SAX parser is when it comes to memory consumption. The memory needed for the SAX parser is mostly the buffer that is reserved to read the XML file into. The DOM tree parser needs memory to store the complete tree. This tree is not needed any more after the application has grabbed the data from the parse tree and fed into the program. But those trees significantly add to the peak memory consumption during loading. Another serious problem that the SAX parser avoids is memory fragmentation. The allocation of the different elements of the DOM tree may result in several small allocations for the different elements. First the parser reads the DOM tree and does its allocations. Afterwards the objects that make use of the tree do further allocations. Finally, the memory for the DOM tree is freed, leaving big holes in the memory block. The author experienced this phenomenon with a dedicated data format that generated a tree structure in memory from parsing files. To sum up the advantages of the two approaches:
Advantages of SAX
Advantages of DOM
The advantages of one technique are the disadvantages of the other one and vice versa.
As a final result one can say that applying SAX parsers is useful when performance is an issue or is about to become an issue. Switching from DOM to SAX can require a rework of the code to make sure that the data is fed in the correct order to the respective objects. As performance is always a major issue during development of games, using an SAX parser seems to be mandatory.Conclusion
In this article we have explained the two basic approaches that exist to parse XML files. Besides the more applied DOM tree model, we have explained how to apply the SAX parser. The application of the SAX parser seems to be more complicated at first sight. In order to overcome the difficulty in adopting the unfamiliar control flow of the SAX parser, we have introduced a concept, based on a visitor pattern, that makes the reading of data into the application easy.
After closer examination, the SAX parser offers a lot of performance advantages that make it especially useful to use with games that have to read a lot of data. With console development one has to face the problems of memory consumption and fragmentation, for which this approach seems ideally suited.
Even if XML data is not used in the final master of the game using the SAX parser during development can significantly reduce turnaround times. It may also become necessary if one uses a development kit with limited memory.References
[RFC 3076] http://rfc.net/rfc3076.html
[derVlist2002] Eric van der Vlist: XML Schema; 2002 O'Reilly Associates