The creation and implementation of interactive audio elements is frequently frustrating and stressful. The programmers sometimes wind up making decisions that the audio artists should be making, and worse, sometimes the audio artists wind up [gulp] programming!
Referring to this situation, game composer Michael Land said, "Back in 1995, I was talking with The Fat Man at GDC about the challenges of creating and implementing good game audio, and the metaphor that really summed it up for me was 'good fences make good neighbors'." What Land meant was that when the programmers get to write code and don't have to make artistic decisions, and when sound artists can focus their attention on creating great music and sound design without having to write code, the result is a harmonious union of technological and artistic efforts. This is how truly great game audio is produced.
Interactive XMF is an emerging standard that will help put creative and technical control in the hands of the right people. Developed by the MIDI Manufacturers Association (MMA), XMF (eXtensible Music Format) is a new, low-overhead, meta file format for bundling collections of data resources in one or more formats into a single file. The Interactive Audio Special Interest Group (IA-SIG) has taken on the task of defining an Interactive XMF file format, and is also overseeing the development of a platform-independent software entity called a Soundtrack Manager that utilizes this format to facilitate the development and implementation of highly interactive audio.
The Evolution of Game Audio
In the beginning there was "beep". Some programmer wrote that one. It was well implemented, too. It did the job. It helped make that dot hitting those paddles seem more realistic.
Then came two "boops", a "beep", and a "pffft", what game composer David Govett used to call "three string players with bad tone and a really good one-armed drummer." And that led to actual music composition, sometimes even by composers, but also frequently done by someone who could both compose and program, a.k.a. the "comprogrammer" or "programposer". But in all cases, a programmer was needed to code the music. Game music in those days was usually delivered as note lists or on manuscript paper. This resulted in the composer being completely at the mercy of the programmer (a scary notion). But the upside for the composer was that the skills he needed to create this music were solely conventional composing skills and the ability to work within very specific parameters. A composer needed no special software or recording gear. The boatload of labor involved in getting the sound into the game was all done by the programmer.
These days game composers and sound designers (which I'll call "audio artists") create linear audio using (more or less) the same methods that are used for film, television, and music CDs. This has been the case since the early '90s. But lately the quantity and fidelity of these linear audio snippets has grown to the point that audio artists no longer feel restrained by game technology. WAV files and their like have, for the most part, replaced the need to deliver audio work via manuscript paper, various flavors of MIDI-related formats, and console-specific formats. As a result, audio artists are breathing a collective sigh of relief because they can now be fairly confident that their work will play back within games as they intended it to sound. Quite a bit of programming effort and attention has been devoted to achieving this audio consistency and realism.
But consistency and realism were relatively easy to achieve - a linear path of progression could be defined and followed. Start with "beep" created by a programmer and end with something that sounds like it came off a new CD. The end product, CD-quality audio, was a known quantity. Everyone knew where they were going, and everyone went there. The same cannot be said about interactive audio implementation. There is no existing example in any other field that can be cited as a goal.
But the development of tools to implement non-linear audio has received little attention, and what work that has been done in this area has primarily been proprietary. So in one sense, the evolution of the art of composing music for games has barely begun. Game composer George Sanger, a.k.a. The Fat Man, put it this way: "With all respect to the creators of great innovative interactive scores, I feel confident in saying that our industry shows generally a sense of feeling its way through the dark passages and unlit corridors and dim tunnels and other such analogies of these, our early years. To get any sense of what great sound can be made, one would have to cross the line into 'legitimate music' and read the writings of John Cage. There is no equivalent in this industry's body of literature to Schoenberg's analyses of melodic writing, of repetition and variation, of surprise and satisfaction, that takes into even slight account what happens to music when a twelve-year-old boy is constantly shuffling the pages of the score."
In every game that's ever been made, snippets of linear audio have been triggered by game related events, and sometimes vice-versa. On the surface, coming up with a method for implementing interactive audio seems like fairly simple and straightforward. But it's proven to be a tough nut to crack. A handful of game development houses have spent large amounts of time and money creating very nice proprietary tools let their in-house audio teams implement interactive audio in games. The Lucas Arts iMUSE system is one example. But three or four people only ever used iMUSE, and at this point it is no longer that company's main audio implementation tool.
George "The Fat Man" Sanger
Sanger says, "This situation is reflected to greater and lesser degrees at a handful of studios, some large, some small, always more or less following the same path, always more or less winding up in the same tar pit. The path goes like this: The development team realizes that there is a problem with audio in that, fundamentally, there is no system or tool to implement [interactive audio]. They look around, maybe, to see if a commercial tool exists, and it doesn't. They build their own [tools] without leveraging any of the work that's been done a dozen times before, without experiencing any of the benefits of the other million dollars and ten years that have been invested in this issue. Once their tool is made, a few composers experience the benefits of the tool (after what certainly would be a hellish debugging period - the reader should pause to reflect on just how bad this might be), make a few games, and leave the company. Regardless of his background, the next composer needs to be trained or his work cannot benefit the games that this company is making. And because the company's administrators insist that the tool be proprietary, the game producers live in a constant state of frustration that nobody outside the company can be trained in its use and there is always a shortage of qualified sound designers and interactive composers."
At the Game Developers Conference this year, some solutions to this problem were finally presented. At least four new audio integration tools were shown by Microsoft Xbox, Sony, Creative Labs, and Sensaura. Unfortunately, only one of them is cross-platform though, and there still exists no standard file format to let these tools exchange information and to leverage audio artist and audio programmer experience.
The Evolution of Interactive XMF
In the beginning there was RMF. That was the proprietary file format that the Beatnik audio engine used, primarily for web audio applications. It was packaged as the Beatnik Player, a plug-in for web browsers. Because the folks at Beatnik wanted an open standard replacement and a next generation version of the RMF file format to get into other markets (such as the mobile device market, which is standards-based), they proposed an XMF working group to the MIDI Manufacturers Association (MMA). Beatnik had two reasons for choosing the MMA:
- All of the applications that they envisioned at the time used standard MIDI files
- The MMA had experience defining such a standard, as it had developed the open-standard DLS formats for portable Wavetable instrument definitions.
The MMA working group, which consisted of representatives from Beatnik, IBM, Sun, Line 6, Yamaha, and many others, expanded on the basic concept of RMF, incorporated existing open technologies and invented some new ones, and called the result XMF. They created a more flexible file structure, made the Metadata system more robust and flexible, and created a standard mechanism for block operations like resource encryption and data compression. Chris Grigg of Beatnik, who is the father of XMF, explains, "When you put all that together you basically have a container technology that can be the basis for any standardized or even proprietary file format. It's like a file format construction kit."
At the time it was developed, there were two immediate needs that XMF addressed. The first was the need to replace RMF with a format that combined MIDI scores and custom instruments, so that audio sounded exactly as the composer intended. The second need that XMF initially addressed was that of providing an open standard format for web applications and mobile devices. Developers and the Internet/open-source community wanted this so they could write their own implementations.
The MMA published the XMF specification in October of 2001. Since then, several companies have adopted the technology for their own proprietary file formats. For example, Creative Labs used it in their interactive audio tool, ISACT. These implementations are really using just the container technology part of XMF. The other part of XMF, standardized file formats, has taken longer - standards efforts typically do. But now the pot is starting to boil. And certainly one of the most interesting applications bubbling to the surface now is the development of IXMF by the IA-SIG.
The notion of a standard file format for audio integrator tools started as a burr in the britches of a couple of legendary game composers, Michael Land and George Sanger. A few years back at Project Bar-B-Q (the annual think-tank event for game audio), Land and Sanger discussed the grim situation created by proprietary integrator tools. "That discussion led us to see that there was a single element missing that might allow the destructive pattern of the past to change. The missing link was identified as a standard file format that would contain sounds, compositions, and rules of interactivity," says Sanger.
In subsequent years, Project Bar-B-Q work groups focused on integrator tool issues. But it wasn't until 2001 that a standard file format was formally addressed. That was the year that the XMF specifications were finalized. On the first workgroup day of Project Bar-B-Q that year, Chris Grigg presented XMF to the attendees, and afterwards discussed it with Sanger. For Sanger, the planets aligned that day. It became apparent to him that XMF could be used as the basis for the interactive audio standard file format for which and Land had been pining all those many years. Fortunately, Grigg had planned that for XMF all along, and he designed the file format accordingly. As a result, the initial development moved at lightning speed.
"After spending a day and a half in some group or other [at Project Bar-B-Q], I had a dream about how XMF could be used for interactive applications," said Grigg. "So I went off and formed what's called a 'rogue group' and was joined by some fairly amazing people, like Larry the O, The Fat Man, Rob Rampley, Bob Starr and Steve Horowitz. In less than a day, we banged out a rough concept and mocked up an editor. On the strength of that work, an IA-SIG working group was formed, and we've been going ever since."
The members of the IXMF IA-SIG working group (IXWG) consist of game developers, tool developers and audio artists. The members are Chris Grigg, George Sanger, Martin Wilde, Linda Law, Michael Land, Peter McConnell, Brad Fuller, Kurt Heiden, Ron Kuper, Clint Bajakian, Guy Whitmore, Peter Clare, Brian Schmidt, Andrew Ezekiel Rostaing, Steve Horowitz, and Alistair Hirst.
"Work on the IXMF specification is not complete, but it's getting close," says Grigg. "The IA-SIG working group is reviewing a detailed design that we completed over the winter and we should have something for developers and artists to look at this Fall. The spec mainly focuses on the file data format, but more importantly, implicit in that is a model for an advanced, data-driven, run-time soundtrack manager."
How IXMF Files Are Used
It's probably easier to understand the IXMF file format after first gaining an understanding of the system in which it is used. So here we go.
There are three aspects to game audio that must somehow work together in order for everything to function in a game as the designers and audio artists intended: the platform, the game (or audition application or editor), and audio content. With IXMF, the audio content is bundled with all of the information that describes how that content is to be used in the game.
Since IXMF is a cross-platform solution, some platform-independent middleware is needed. This middleware is called the Soundtrack Manager. The Soundtrack Manager manages the performance of the soundtrack and all of the audio content resources that combine to create the soundtrack. The Soundtrack Manager can be specific to a single game, group of games, development house, ad infinitum. It supports the same advanced interactive audio feature set on any platform, while also allowing access to platform-specific features.
The Soundtrack Manager receives high-level requests for interactive audio services from the game and handles them by coordinating the operation of multiple, platform-specific, low-level media players. It supplies these players with sound media stored in the IXMF media files, and controls the players via a small set of simple audio commands that are passed to system-specific Playback APIs via an Adapter Layer. It can also send information back to the game via callbacks or shared variables.
For each platform that will host the game, an Adapter Layer for that platform must be written to communicate between the Soundtrack Manager and the platform's native APIs. So the Adapter Layer code is platform specific, while the Soundtrack Manager code and audio content are platform independent.
At this point, some terms that are used in conjunction with IXMF should be defined. These are "media chunk", "cue request", and "cue". A media chunk is any piece of playable media data. It can be an entire audio file, a defined contiguous region of an audio file, a Standard MIDI File, or a defined contiguous region within a Standard MIDI file. The continuous soundtrack is built by stringing media chunks together, and sometimes by layering them. A cue request is an event that the game signals to the Soundtrack Manager, and to which the Soundtrack Manager responds with a corresponding action designed by the audio artist at authoring time. That action is called a cue. A cue can contain any combination of services or operations that the Soundtrack Manager can perform. In most cases a cue will contain a playable soundtrack element but it may also be used to perform other Soundtrack Manager functions that don't result in something audible, such as setting a variable, loading media, or executing a callback to the game.
The Soundtrack Manager controls the audio playback by providing, at a minimum, the following functionality in response to cue requests:
- Responding to game sound requests by playing appropriate sound media, sometimes influenced by game state
- Constructing continuous soundtrack elements from discrete media chunks, whether via static play lists or dynamic rules
- Dynamically ordering or selecting which media chunks get played, sometimes influenced by game state, sometimes to reduce repetition
- Mixing and/or muting parallel tracks within media chunks
- Providing continuous, dynamic control of DSP parameters such as volume, pan, and 3D spatial position, sometimes influenced by game state, sometimes to reduce repetition
- Controlling how media is handled, including how it is stored and how it is played back
- Handling callbacks.
While a game is running, the flow will go something like the following. An event will occur or a condition will arise in the game, and the game will recognize that it needs to send a cue request to the Soundtrack Manager. The Soundtrack Manager will access the appropriate playable sound media, along with its interactivity data, and play the media according to its artist-specified playback parameters. It does this by passing instructions to the Adapter Layer, which will in turn pass instructions through to the playback API. The interactivity data associated with the media that just played may also include instructions for the Soundtrack Manager to pass data back to the game, which the obedient Soundtrack Manager dutifully performs.
The IXMF File Format
A single IXMF file contains all of the information needed for a game soundtrack, or a level, or a character, or any other scope. This includes all media and all information necessary to play that media as the audio artist intended. An IXMF file has its own internal folder tree, starting at a Root folder. This Root folder contains metadata fields, a Cues folder, a MediaChunks folder, a MediaFiles folder, a Transitions folder, a PositionRules folder, and a Callbacks folder. The metadata fields at this root level may contain artist notes and other general information about the soundtrack, and default values for variables.
The Cues folder contains all of the cue description resources for the soundtrack and may contain files that provide information for setup and teardown of the soundtrack. Each cue file has a tagged or indexed list of one or more links to media chunk files called a "chunk pool", and metadata necessary for using those media chunk files. The cue metadata, like the root node metadata, may contain general information and default values for variables. These variable settings take precedence over their root node values and apply to all media chunks in the cue's chunk pool. The MediaChunks folder contains all of the media chunk resources for the soundtrack. A media chunk file may contain some of the same metadata fields as a cue. Its variable settings take precedence over those of the cue that refers to it in its chunk pool and apply only to itself. Examples of standard metadata fields are MediaFileID, MediaType, DefaultMediaHandling, DefaultSyncGroup, DefaultTempoMapID, DefaultTransition, DefaultMixGroup, GainTriminDB, and RAM_Usage. In all cases, there will be standard metadata fields, but there is always the option to add custom fields so that audio teams can extend the functionality of IXMF in any direction they choose.
The MediaFiles folder contains all of the playable media files for the soundtrack, or pointers to playable media files that exist outside the IXMF file. Playable media files can be any type of audio asset file such as WAV, Standard MIDI, AIFF, SDII, and so on.
The Transitions folder contains all of the transition definition resources, the PositionRules folder contains all of the position rule definition resources, and the Callbacks folder contains all of the callback definition resources that will be used by the soundtrack. Some of these files will be predefined; some will be scripts created by the sound artist using a simple scripting language. Much of what the audio artist will need in the way of transitions, position rules, and callback definitions will be predefined. Examples of predefined transitions are cross-fades and but edits. Predefined position rules will include "start at chunk beginning" and "start at next bar", and predefined callbacks will include "cue end" and "chunk end". Scripting will allow more exotic and case-specific media control.
IXMF, using no custom metadata fields and no custom scripts, is intended address eighty percent of an audio artist's needs, eighty percent of the time. Scripting capabilities and artist-defined metadata fields allow the functionality to be extended in any manner that is desired by the audio team. The "X" in IXMF stands for "eXtensible".
The IXMF Impact
Enough of the nuts and bolts and beats and bits. At this point you are probably asking yourself, "How will IXMF affect my work as an audio artist / audio programmer? What's the bottom line for me?"
First, let's look at audio content creation. Composer George Sanger says, "In the 20 years I've been doing my job, I have never encountered a tool that allows me to map the linear audio I've created into the interactive context I've envisioned. In effect, I have been a blacksmith with no anvil. At worst, I have been able to create fairly non-interactive bits of audio, or I have given written or spoken instructions to programmers, which have only served to frustrate us all. In some cases, because of the lack of a tool, I have had to lean my work towards programming, or have had to ask a programmer to lean more towards artistry, which inevitably leads to more work, less efficiency, worse art, and frustration all around. Certainly a tool is needed. Without IXMF, it is possible for a tool to come into being to allow me to do my job for a specific platform or audio engine, and I get the sense that this brings food to the starving masses, but no water. Guitars and amps but no cables. Keyboards but no MIDI."
With widespread use of IXMF, an audio artist would be able to create audio content and save it in a format that could be read by any game audio playback engine. This would allow a composer to use the tools he likes, and hop between them to use his favorite features in each without losing or damaging any file information along the way. Workflow would no longer need to be reinvented for every game, every developer, every platform. An audio artist's experience would be leveraged because the skills he develops and the terminology he uses would be consistent from one work situation to the next. The rework and file management associated with porting a game to another platform would be eliminated. The existence and acceptance of a standard file format would also encourage the development of new and better tools because the customer base would be large enough to support this development. And all of this could combine to create an audio development environment in which interactive sound designs could sound better, more interesting, and be created more rapidly.
"Imagine what happens when, instead of just three or four people at each company having access to these tools, millions of people gain that access," Sanger explains. "College students and home tinkerers and accomplished musicians, engineers and artists will begin not only using the tools, but also creating their own tools and their own ways of writing to and reading from this file format. As seen with MIDI, when the bank of available tools grows, it will become easy to write efficiently, easily, and freely for games, web sites, amusement parks, 'can't-play-a-wrong-note' toys, interactive movies, songs you can change as they play, and other hitherto unheard-of situations. The only predictable thing is that it's uses will be unpredictable."
From the other side of the fence, how can IXMF benefit a game audio programmer? Audio engine programmer Martin Wilde says, "From my perspective, a game audio engine should afford the audio artist the ability to create a seamless, interactive game score with as little programming intervention as possible."
Traditionally, the generation of a good game audio engine has required a great deal of programming effort. And that effort generally results in an audio engine that runs on only one platform and is focused on implementing what Wilde refers to as low-level operators, which are "bits of code or functions that load and play digital audio or MIDI, but have no higher musical purpose or understanding." Also, the audio artist must usually turn his creations over to the programmer for integration into the game, which to some extent puts artistic decisions, schedules, and priorities in the programmer's hands.
Wilde believes IXMF will fix both those problems. As he explains, "First, we have a very comprehensive description and specification of the high-level audio behaviors audio artists wish to have at their disposal. This is very important and significant! The members of the IXWG, representing literally decades of game audio-making experience, have collaborated on the description and specification of the methods, intelligence and building blocks to make interactive game audio soundtracks. This includes a description of a scripting language that allows audio artists to directly control the integration and presentation of their content with minimal impact on the programming team. Second, I as an audio programmer will have an outline from which to build a high-level game audio engine. I won't have to guess what kinds of things might be useful and how they should work in a musical context. I will haveit all in front of me. All I'll have to do is code it once, and I'm done. Well, almost. The IXMF spec also includes an Adapter Layer. This is the code that takes the high-level commands issued from the Soundtrack Manager and translates them into the low-level commands for the specific platform on which the game is running. All I'll have to do is write (or get the platform manufacturers to write) the various adapter layers to the low-level, platform-specific operators I talked about before. Then audio content can truly be authored once, and published many times across platforms."
Once the initial tasks of writing adapter layers and the Soundtrack Manager are completed, the role of the audio engine programmer will evolve. Instead of rewriting engines for every new platform and dealing with artistic audio implementation issues, the programmer will be able to focus on what Wilde calls "clarification and refinement."
As for communication across this fence, IXMF can be the means of creating a much clearer division of labor by providing a means for conveying audio content and interactivity information from the audio artist to the audio programmer in a well thought-out, well-defined, standardized format. Sanger, describing it from the perspective of an audio designer, put it this way, "Here's the sweet part. I think that the interaction between sound designer and programmer will become less like the push-and-pull that happens between a control freak having his dream house built and the drunken, arrogant, monomaniacal, but incredibly talented architect whom he has hired. It will become more like the interaction between a creatively gifted control freak sending a letter and the drunken, arrogant, monomaniacal, but incredibly talented mailman."
There are several aspects of IXMF which could greatly benefit all developers. Using IXMF may decrease a game's time to market. This might not happen the first time a developer uses it, due to the learning curve associated with the new format. But in subsequent games, once the learning curve flattens out, time to market will be reduced since the workflow won't change from game to game and platform to platform, and also because of its inherent simplicity and efficiency. And let's not forget that the amount of time and money needed to move content across platforms will be tremendously reduced.
If IXMF is widely adopted, another benefit to development studios will be the reduction in training time for new audio team members, and the larger talent pool from which companies can select new member for that team. That will result from the consistent format and workflow from one development house to the next.
Game audio quality also stands to improve. This is not to say that it will make a linear-audio snippet sound better - just that the implementation of the audio will be vastly improved.
Wilde believes that widespread use of IXMF is inevitable. "One thing that hasn't been talked about much is the tremendous explosion of content we will see packaged in IXMF files. IXMF will revolutionize and standardize the delivery format of all manner of music and multimedia content, and IXMF players will be the de facto standard everywhere. Content providers won't have to think twice about how to package their creations. Everyone will use IXMF, and then, watch out!" Sanger agrees, saying, "Developers who depend on maintaining their own technical edge over their competitors will be left behind a huge, rapidly advancing community. Developers would be well advised to implement IXMF rather than a proprietary tool, just as they would be well advised not to attempt to create their own Internet."
The Fencing Lesson
What should you do now to prepare for using IXMF? Chris Grigg suggests doing nothing.
"The Soundtrack Manager model we're articulating reflects what basically all advanced interactive audio systems already do," Grigg says. "Just keep getting better at the craft."
Martin Wilde adds, "Get familiar with MIDI + DLS, as this is one of the first formats included in XMF files. Become a member of the IA-SIG and get involved in the interactive audio scene as a whole. There's room for everybody: manufacturers, game developers, audio artists and programmers. Other than that, sit back, and prepare to change the way you think about game audio."