Times have changed in the game development industry. All of the stupid money has fled to Internet commerce applications, and companies in the game world are left having to make their way as viable businesses. There is no market tolerance left for "junk," or as it used to be referred to by a producer at Sega some years ago, "library titles." The "B" and "C" titles that filled out your publishing profile aren't viable any more - in fact, they can be downright deadly. Every game made from now on has to have a legitimate shot at becoming a hit; otherwise, it's just not worth making. This philosophy has inevitably trickled down to audio departments, where the focus now is squarely on quality.
A Rant on Interactivity
What exactly does it mean to produce interactive audio? For some time, interactive audio suffered from an identity crisis. The term has come to mean less and less. In my own jaded way, I always imagined the adjective "interactive" modifying a noun such as "media" or "audio." When a person says that they are "doing interactive" in reference to audio, it usually means the individual has stumbled into a job working on a CD-ROM or web site and is desperately trying to figure out how make 8-bit, 22Khz Red Book audio not sound as if it's being played back across two tin cans connected by string. To this person, "interactive audio" simply means sound for nontraditional media: CD-ROMs, console games, kiosks, and web sites. To me, the term means something entirely different.
If you're describing audio as "interactive," you're implying more than just linear playback. Interactive audio should be constructed in such a way that that the user can affect its performance in real time during playback. I'm talking about reactive, responsive audio, coming from audio drivers that are "aware" of what's happening and can respond by changing the music appropriately. Spooling a two-minute pop song in an unchanging, endless loop during a real-time strategy game is not interactive audio. Perhaps the term "audio for interactive media" would be appropriate instead.
Imagine instead audio that is an interwoven part of a 3D, free-roaming world. As you explore the world, the sound smoothly accompanies you, rising as you encounter danger, falling away as you explore in peace. It sounds triumphant when you succeed, and distant and mournful when you fail. And all of this happens with the precision and emotional impact of a great film score. In a user-driven world such as a game, you have no linear timeline by which to synchronize the changes in the music, as you do in a movie. The audio entirely depends upon the unpredictable input of the user. How can you make this work?
The answer lies in the nature of an interactive 3D world and is made possible by new tools and technologies. The 3D game world is open-ended, a database of terrain, objects, animations, behaviors, and their various relationships. Therefore, the music must also become a database of musical ideas, sounds, instruments, and relationships, imbued with awareness of the other objects in the world and programmed with responsive behaviors of its own.
The Rules of Interactive Sound Design
Over the years, I've worked on about 100 titles, 60 or so in a substantive way. I can distill much of what I have learned from this in a short set of rules.
- There will always be limitations. Hardware limitations, space limitations, design limitations… you name it, and it will be restricted at one time or another. The only resource that's never limited is your ability to come up with creative solutions to these problems.
- Every drop of energy that goes into being discouraged by the limitations of a particular project is energy taken away from making a great sound design.
- Know your role on the team. Projects need to be driven by a singular, cohesive vision usually espoused by a producer, lead designer, or director. Unless you're working on an "audio only" product, audio is a supporting member of the cast; it doesn't lead the design. Audio is no less important to the overall success of the project; but, it follows and supports the design ideas and constraints defined by the project's singular vision. The sound designer should become comfortable in this role so as to avoid great heartache and suffering. However, this doesn't mean that there is no opportunity for creativity. (See Rule 4.)
- This is the "two things" rule. Most of the time, you'll be taking direction from someone who knows less about audio than you do. By saying this, I don't mean to denigrate the skill of the project director; I'm just stating a simple fact. The sound designer is the expert when it comes to the details of audio. Yet the direction for the sound design must come from the person who is responsible the project's overall vision. Otherwise the sound will not hang together with the product. My highly unscientific experience has shown that a project director is unlikely to have more than two identifiable design needs for any given part of the sound design. If you, as the audio designer, satisfy these two things, you're usually free to complete the bulk of the task with your full creative input. It's best to know what these two things are before any significant amount of work is done.
- Run-time resources will always be shared among different disciplines.
- As soon as the artists or programmers figure out how to use something effectively, it will no longer be available for audio (for example, the CD-ROM drive on any game platform).
- Making audio interactive is a team effort. The application must be altered by designers and programmers to support interactive audio. Team buy-in is essential because interactive audio, although very valuable to a project, is more work for everybody.
- The likelihood of audio becoming interactive for any given product is inversely proportional to the amount of programming that's required of individuals who are not specifically assigned to the audio team.
- It's far better to determine how the sound design will interact with the world before you begin creating assets. Retrofitting interactivity into audio designs, especially music, is difficult at best, and severely compromised, if not impossible, at worst.
- Leverage off of existing technology wherever possible. If you plan to create new audio technology, use off-the-shelf tools whenever you can. For example, I can't conceive of a scenario where it would make sense to write your own MIDI sequencer. Programs such as Opcode's Vision and Logic Audio are great tools. I can't even begin to speculate on how many person-hours went into making them. It would be crazy to invest development dollars in a "roll your own" sequencer. Rather, we need to create additional tools that map out the territory that is unique to our endeavor. Such tools should begin with the output of commercial tools such as a MIDI sequencer and add functionality as needed.
Creating an Adaptive Audio Engine
Armed with the desire for truly interactive audio, we at Crystal Dynamics set out to create our own sound driver for the Sony PlayStation and, perhaps later, for Windows 95. From my notions of interactivity and set of rules for interactive audio, we derived a number of design goals for our driver.
EMPHASIZE MIDI OVER DIGITAL AUDIO. For most of our products, MIDI and custom DLS-like instruments are a better way to go than Red Book or streaming digital audio. Rules 1 and 5 have some implications for Red Book audio. Red Book audio sounds great, but fortunately for Crystal Dynamics, our programmers know how to get the most out of the CD-ROM drive for game play. Therefore, it's not always available for audio. Furthermore, creating interactive sound designs using streamed, branching digital audio segments is limited in many ways, primarily by disk space, seek times, bus bandwidth, and buffer sizes. Red Book, or any kind of linear, streamed digital audio requires a lot of storage space in any situation. But it becomes even more problematic in an adaptive setting where each variation of a given section of music has to be remixed and stored uncompressed. Finally, most consoles (including the PlayStation) save money by using inexpensive (read, slow and not very reliable) CD-ROM drives. Thus, the constant seeking and playing of digital audio tracks is likely to tax the CD-ROM drive to the point of failure. Red Book audio should therefore be reserved for noninteractive sections of the game, such as title screens.
On the other hand, MIDI is small, compact, and easily modifiable on the fly. The PlayStation has its own dedicated sound RAM, and all of the data needed for a level can be loaded in less than a second. Once loaded, the data is out of the way and the CD-ROM returns to its other duties. Furthermore, the PlayStation contains a pretty good sampler. Respectable music and sound effects were created for the Super Nintendo as well, but that platform suffered from limited sound RAM. Fortunately, the PlayStation has almost ten times as much sound RAM, much better sound interpolation (an on-the-fly sample rate conversion technique used to stretch a sample up or down the keyboard from its native pitch), and superior DSP (used for reverb and the like). In my opinion as a confirmed curmudgeon, anyone who says that they can't make high-quality MIDI music on the PlayStation under these conditions is just whining about the amount of work involved in such an endeavor.
KEEP SOUND DRIVERS EFFICIENT. When we considered replacing the existing sound driver with our own technology, we decided that our code would need to be faster, smaller, easier to implement, and more capable than the code we would be replacing. Otherwise, the project was not worth undertaking. Rule 1 dictates that sound drivers must be small and fast, since both system RAM (where the driver resides) and CPU time are scarce commodities in a fully rendered 3D world. Making sound drivers easy to use is important; programmer and game designer time are limited commodities, so making basic implementation easier leaves more time for these folks to work on making the world ready for your interactive audio.
There should also be a simple, consistent means for your game to communicate relevant information about itself to the sound driver. Adding interactive sound capabilities requires programmers and designers to spend more time communicating information to the sound driver about the state of the world and the characters within it. At Crystal Dynamics, we tried to remedy this situation by communicating the state of the world to the sound driver in the form of a set of simple, numerically registered variables. Most often, we use values from 0 to 127 so that they could be set from standard 7-bit MIDI controllers. Thus, the number of enemies alive on the screen might be represented as one 7-bit variable. Your distance from the level's exit might be stored in another. We have tried to use these same variables throughout the game so that they only need to be coded once.
CODE OR DIE. It's important to put the logic programming in the hands of the sound designer, not the game programmer. Rule 8 clearly shows the logic behind this. It's hard enough to explain which aspects of the world you need to track. It's almost impossible (and I think unreasonable) to expect the game programmers to write code to mute and unmute specific MIDI channels when various condition arise. To solve this problem, we created (with some help from Jim Wright at IBM's Watson research lab) a programming language that allows us to author logical commands within a stock MIDI sequencing environment and store them within a standard MIDI file. The language contains a set of fairly simple Boolean functions (if then, else, endif), navigational commands (goto, label, loop, loop end), a set of data manipulation commands (get, set, sweep), and parameter controls (channel volume, pan, transpose, and so on). Next, we created an auditioning tool that allowed us to simulate the run-time game environment, kick out logic-enhanced sequences, manipulate the game state variables, send commands, and see what happens.
Case Study: The GEX Project
Most of my work this year has centered around the interactive sound design for our upcoming PC and PlayStation title, Gex: Enter the Gecko. (The PlayStation version will be the first title to use our new sound driver.) This game is the second installment of the Gex franchise and has undergone a complete technological overhaul by some of the best programmers and designers in the business. The game will include at least eight worlds thematically based upon TV and movie parodies, as well as secret, bonus, and boss levels - fertile ground for a sound designer indeed.
I'll talk briefly about what we're doing currently in the game's audio. For clarity's sake, I've focused on the treatment of a variable called numberofenemies and a small number of related variables. These few examples are by no means all that we plan to do in the game, but they illustrate my main points.
One aspect of the new Gex title is the game's fast and efficient engine. It allows us to have more enemies on screen moving at a faster rate. Since this is one of the features that really makes the product stand out, we on the audio team designed the interactive audio to work tightly with the new engine functionality. We set up a 7-bit variable called numberofenemies that reflects the number of enemies on the screen and is updated by the game continually. This variable is read and used by the sound driver to adjust the game audio.
Here's a breakdown of the Gex audio, by game level:
THE PRIMAL ZONE. The control track of the MIDI sequence has logic built into it. When the numberofenemies register is 0 and kills is 1, the sequence is paused, and playback of all tracks is branched to a short stinger of the Gex theme song orchestrated with the instrument palette of the Primal Zone level. When the stinger is over, the main sequence is resumed.
We keep a location register that allows the sound driver to see if Gex has entered a new area of the map that has different music. The control track checks every four beats to see if the value of location has changed. If it has, a short transition drum fill plays, and then the MIDI sequence that is matched with value of location is started on the beat.
SCREAM TV. Scream TV is the "horror" level of Enter the Gecko. For this level, I used eerie chamber music laid over some ambient loops of slow ragged breathing and strange heartbeats. The same numberofenemies register is used in this level, but instead of muting and unmuting, the value of the register affects the harmonic center of the piece. For this type of transformation, we use a technique called "pitch mapping." A pitch map in our system is a mechanism to remap the pitches of individual notes as they come from the MIDI sequence on their way to the physical voice. Our pitch maps are built so that the transformation can be kept within the prevailing harmony of the piece. In other words, for a scary piece such as this, the pitch table will confine all of the note remappings to a diminished scale. The numberofenemies value is used continuously to set the number of scale degrees upwards to transpose the harmonic instruments. This creates a very smooth, subtle, intensifying effect as more and more enemies are on the screen and danger is increasing.
CIRCUIT CENTRAL. The Circuit Central level looks like the inside of a CPU. The opening music is weird, ambient analog electronica with a slow, trip-hop beat. After some exploring, it becomes clear to the player that there are jumps and expanses that cannot be passed in Gex's normal state. Within this time, the player should also discover a number of chargers set out in various locations. When Gex steps into a charger, he starts to glow in a green light with orbiting electrons. This state lasts for 15 seconds and gives Gex the ability to use embedded "chips" in the floor to do super jumps and turn on "data bridges" that span the formerly uncrossable chasms.
The state of being charged up is stored as a 1 in a chargedup register, and location keeps track of the seven different areas within the main level. When the control track in the main sequence sees a 1 in chargedup, it checks location and matches the area to one of seven different 15-second-long high-energy pieces of audio.
While most often the game affects the sound design, this relationship can also work the other way. In Circuit Central, we set a register at every beat of the music. Since each measure has four beats, I set the beats variable with a 1 on the down beat followed by 2, 3, and 4 on the remaining three beats and then cycle back to 1. This timing information will synchronize the lighting effects in this world.
Only the Beginning
Adaptive audio is a very new field. Many challenges lie ahead, but I firmly believe that it represents the future of sound in interactive media. To move forward, we need a major paradigm shift in how we think about music, great tools, new technology, and a healthy dose of realism. I hope that my anecdotes, rants, and factoids have shed some light on and sparked more interest in creating interactive audio. So long for now from audio central at Crystal D.