Next-generation consoles push more than polygons and deliver more than seamless online integration; they also deliver an unrivaled audio experience to gamers, which can both subtly and obviously increase the quality of the overall gameplay experience immensely.
We're here to crack open the boxes that play these games and take a look under the hood. A lot of speculation has been flying around about just what the heavyweight current generation consoles are capable of. Let us raise the curtain. I managed to hook up with Gene Semel, audio director of Sony Computer Entertainment America, and Brian Schmidt, head of the Xbox audio team, and they gave us the lowdown.
This is followed by discussions with the audio staff on two major titles for the systems: Uncharted: Drake's Fortune on PlayStation 3, and Halo 3 on the Xbox 360.
What exactly does the internal team at Microsoft do for Xbox, apart from support its developers?
Brian Schmidt: There are multiple teams at work here. You probably have the most face to face contact with XNA's XDC "Xbox Developer Connection" group. Their job is to make sure that people get the best out of Xbox 360 through support, help, samples and so on. They are the group that puts on Gamefest, man the developer support aliases, scour the newsgroups, write whitepapers, etc. and provide front-line pro-active and re-active support for game developers.
A second group is XNA PGP (Professional Game Platform). That is my group. We write the code, libraries and tools that ship in the Xbox XDK and DirectX SDK. So we are the authors of XAudio, XAudio 2, XACT, XMP and are largely responsible for the overall architecture of the system. We're constantly working behind the scenes, adding features and improving efficiency.
We also look at the long-term picture of what tools and technologies we need to be working on for the future. The tools and technologies we provide are used by every Xbox 360 game that ships, allowing game developers to take full advantage of the system in an easy and efficient way. This has a huge impact on the final games that make it into customer's hands.
Of course there are account managers, Game Qualification (Certification), peripheral developers, silicon developers... Far too many to list, actually. But we're all dedicated to making sure that game developer can the most out of our platforms in the most efficient way.
We in PGP work extremely closely with all these groups and often the line between us is a blurry one. I should say that pretty much everyone on the PGP audio team is an active musician, too.
How much were you involved with the Xbox 360's audio hardware design?
BS: I was pretty much responsible for the overall audio system architecture, though of course many others were involved as well. That includes XMA, XAudio, XACT and how the pieces fit together. XMA was an interesting collaboration across a few groups at Microsoft. One of our silicon designers was working on Xbox silicon and thought we could put WMA decoding in hardware.
After some discussions it became clear that with a few modifications we could provide many more voices and take care of some specific gaming scenarios that WMA didn't address, like seamless looping by tweaking things here and there a bit. So we worked with the WMA team on those tweaks and XMA hardware was born.
How much has the industry changed in terms of manpower and budget compared to the previous generation?
BS: Teams, manpower and budget have followed the sheer increase in the size of projects. That's one of the reasons for our investments in high-level technologies such as XACT, that help streamline the process and workflow for creating next generation audio content.
How easy would it be for someone to get hold of an Xbox test or dev kit and train for audio integration using XACT? Are development licenses difficult to get, or can one borrow a 360 for educational purposes?
BS: We ship the exact same XACT for Windows as we do for Xbox 360, so you can easily get XACT for DirectX and run that. Everything you learn will be applicable to Xbox 360. If you really want to run on an Xbox 360, the easiest way is to download XNA Game Studio 2.0. That will let you run your own code on Xbox 360, provided you join the XNA Creators Club. That's the easiest and cheapest way to have direct hands on experience on an Xbox 360.
If you are a more serious audio developer, you can join the Xbox 360 Registered Content Creator Program. This is a program for industry professionals who are already working on an Xbox 360 title that lets you download the actual Xbox 360 XDK and also lets you purchase/license an Xbox 360 Developer Kit.
Were there any requests from developers for changes to the original Xbox audio architecture that made it into the design for the 360?
BS: Plenty. XAudio 2 is a great example. Developers wanted a more flexible way of dealing with audio than they could previously, so we created a new XAudio 2 audio API for Xbox 360. They also said "we want cross platform code with Windows" so we brought XAudio 2 to Windows as well. We also have XACT cross platform from Windows and Xbox 360.
We also have a lot of experiences with the challenges of game audio and tried to some other innovation as well. No one came out and said "give us XMA, but we knew that in any system, memory is always an issue for audio, so we created XMA. Likewise our move to a software audio system makes it much easier to use DSP effects than previous generation consoles, which is a key feature for next gen gaming.
Have you ever assisted development with internal Xbox games directly with changes to SDK/APIs? Is this kept proprietary, or made available with updates?
BS: When we first launched, there were plenty of changes to the XDK that were the direct result of specific games' requests or issues. We'd roll those into the next XDK and then everyone could take advantage of the new features in the next release. Early on, we released every month, so the turnaround was pretty quick. As the platform matured, we find ourselves doing that less, though.
How do developers use media space for games?
BS: Games on Xbox 360 run off of a standard DVD-9. We've found that this is more than enough space for the vast majority of games. A large numbers of games don't fill up the DVD-9. Mass Effect is a great proof point. It's a deep, involved RPG game, with long gameplay, breathtaking graphics, and a rich script with plenty of voiceover. It fits on a single DVD-9.
The same is true for Halo 3. It was the video game event of 2007, and fits on a DVD-9. The Xbox 360 development kit includes some extremely advanced compression technologies such as VC-1 video compression, XMA, WMA to name a few. These allow you to store high definition assets for a fraction of the storage space required by older technologies.
What kind of effects does the Xbox use natively, and can they be tested anywhere in the basic SDK?
BS: We ship with a really nice reverb we licensed from a company called Princeton Digital. They are a small company that licenses audio algorithms to pro audio hardware companies such as Eventide. It's a variant of the 2016 reverb called the 2016/360. We provide sample code for some simple DSP effects like parametric EQ, delay, compression and simple filtering. You can check those out in the samples for the XDK and SDK. There are also some companies making DSP middleware for Xbox 360. For example, Halo 3 used some effects from Waves.
Is there a way to test XACT in realtime without having to build a game engine to do it?
BS: There is an application that ships with both Windows and as part of the Xbox Development Kit that lets a sound designer fire up XACT and play with it.
What are the most interesting features that have been taken advantage of for the Xbox 360 for audio? More channels? Dynamic compression?
BS: XMA certainly -- every game uses it, and it's the primary audio format for the Xbox 360. It lets you store between eight and 10 times as much audio into memory. That makes a HUGE difference what a sound designer can deliver. I also found Halo 3's use of the Waves technologies very cool, and we're excited to have partnered with them.
Actually, one of the "most interesting features" that has been used is just the fact that, aside from XMA, we've moved to an easily programmable software audio architecture. I've seen some games do some amazing things because they could just write some C code, either for DSP effects, 3D or entire audio engines. It really has unleashed a lot of creativity in my opinion.
What game has impressed you the most for the 360 with its use of audio?
BS: BioShock, Halo 3, Call of Duty 4... there are just too many really great-sounding games to call out individuals... The amount of great audio work coming out of the studios these days is really amazing.
What is the average amount of audio memory that you are seeing used in 360 games?
BS: It really varies. I've seen as few as a couple MB (RAM footprint) to more than 64. Of course, using XMA gives you about 10:1, so that 64Mb is the equivalent of 640MB of PCM (or a full audio CD). On disk it also varies... I've seen games that use one-third of the disc for sound/music/dialog (about 2GB). Of course, I'd imagine that Guitar Hero III or Rock Band use more than the average game.
The Xbox 360 is definitely a console to contend with. But in comes the challenger. In the blue corner, the successor to the most popular home console of all time, the PlayStation 2, comes the PlayStation 3! We chatted with Gene Semel, Audio Director of SCEA (Sony Computer Entertainment of America) San Diego about its capabilities.
There is a tools team and a content development team. How do they work together and with third party developers? (First party would be a hardware manufacturer that also develops games such as Nintendo and third party would be someone that only produces games)
Gene Semel: The
internal Tools and Technology groups work on proprietary tools and tech
specific to the various projects. Additionally, they do help with best
practices for all the internal developers however usually that is in the
context of project specific discussions. As with any major publisher, we have
forums that the 1st party developers communicate on. The 3rd party tools, such
as Scream which are included with the PS3 SDK's are supported and updated by
the Sony PlayStation Europe division.
How much has the industry changed in terms of manpower
and budget compared to the previous generation?
GS: Here at Sony PlayStation, audio resources have generally expanded in relation to the content requirements similar to art, however not necessarily in equal proportion. Having said that, I have seen developers that have the same audio resources as they had allocated in previous generation titles. What seems to happen on many next-gen titles in general for all developers is that the art and level content piles up very quickly and developers have to manage that rush of content with more resources in some way.
Level art and animations alone can be such a huge amount of work and can be a killer if it comes on too fast for an audio team to keep up with and maintain a quality bar that is next-gen. The best way to handle this is for the developers to have a close communication protocol and pipeline with their audio team so that they are in the loop the entire cycle of development working with the other disciplines in parallel even working on content that isn't yet in-game. This allows for iteration outside of the build systems potentially and provides the audio teams the ability to development the content that inevitably will pile up quickly and which can appear sometimes all at once it.
Another good idea
is to align (or pair up) the designers and artists with audio team members very
early so that the audio team can use their virtual binoculars to see how
content and gameplay will come together in the end. Ultimately, the biggest
change with previous gen is the amount of effort required in communication to
navigate the development process with more people, more art, more everything.
How easy would it be for someone to get hold of a PS3
test or dev kit and train for audio integration using SCREAM? Are development
licenses difficult to get, or can one borrow a PS3 for educational purposes?
licenses for external contractors aren't easy to get. It is possible, however,
for a developer to loan hardware to a contractor through signed agreements. The
developers would then be the conduit for the contractor providing the data and
Were there any requests from developers for changes to
the PS2 audio architecture that made it into the design for the PS3?
GS: The PS2
architecture wasn't next-gen and limited and so the design of the PS3 Cell
allows designers endless possibilities, albeit they need programming resources
to realize their ideas, goals and dreams. The PS2 did not support Dolby Digital
or DTS which is new for the PS3.
Have you ever assisted development with internal PS3
games directly with changes to SDK / APIs? Is this kept proprietary, or made
available with updates?
GS: Every title has different requirements which typically make use of proprietary tools and/or functionality. Sony PlayStation has to support both internal and third-party developers so new features or tools that are built for one game may not apply across the board making chance seem slower sometimes.
What kind of effects does the PS3 use natively, and can they be tested anywhere in the basic SDK?
GS: The SPU's on the Cell processor allows for all types of effects to run natively such as chorus, distortion, general filtering, reverb, etc. The Cell processors are great for DSP and there are many different effects that come with MultiStream in the PS3 SDK.
See also this quote: "A great example of the power of the Cell processor is that MultiStream can process 50 * 2 second convolution reverbs on one SPU in realtime. MultiStream can also decode approximately 400 MP3's on a single SPU in realtime." -- Jason Page, SCEE (Sony Computer Entertainment Europe).
What are the most interesting features that have been
taken advantage of for the PS3 for audio? More channels? Dynamic compression?
GS: Some obvious advantages are pooled memory and the cell architecture that allow for some serious processing power for more real-time interactive mixing. More channels are good but not necessarily always better.
What game has impressed you the most for
the PS3 with its use of audio?
GS: Uncharted was a huge success both
graphically and audibly. Just as with the previous generation consoles, things
only get better with time and I'm excited about our future projects in
development and raising the quality bar that Sony is known for.
What is the average amount of audio memory that you are
seeing used in PS3 games?
This is always the
first question most sound designers care about when preparing for a new
product. The average have been around 30MB, with some more some much less.
Given the PS3's pooled memory RAM resources, negotiation all the way through
development isn't unusual as the game is realized and resources make themselves
obvious in terms of productions priorities.
What would the next big step be for audio hardware and
software in games? Realtime synthesis of voice / sfx? Planar emitters?
The future of developing audio for interactive media will be tools that expose and allow complete customization of hardware and software resources for both the audio and programming teams. Tools that account for very fast iteration is the definitely the future, both from a sound design per sound perspective as well as sounds all at once mix perspective.
Having the ability to interface with art assets and allowing those assets to scale automatically as they are updated as an example, will be critical for maintaining quality with smaller teams and/or limited resources. Lastly, I foresee more robust real-time logic systems that will allow sound designers and production directors to actually make decisions and mix the game at a "post-production" like stage of development.
The Contenders Face Off
Essentially the playing field looks about even between these two heavyweights. While the Xbox 360 still has a lead over the PlayStation 3 in terms of sales numbers, in terms of raw power for audio many cross platform games have very equivalent sound quality, most notably BioShock.
Xbox Contender: Halo 3
Marty O'Donnell directed audio for the worldwide hit Halo 3. In Halo 3 a great marriage took place between the ubiquitous Waves software bundle and a video game. I chatted with him about how Waves was used.
How did you hook up with Waves for realtime use of their effects?
Marty O'Donnell: I've been talking with Brian Schmidt about using Waves with the Xbox for a number of years and he put me in touch with Paul Bundschuh from Waves. Paul and I had a great conversation and we started moving forward on making it happen.
Which effects did you end up using?
MO: We used the L360 which is based on the L1 rather than the L2 (L2 too much of a CPU hit). It really worked well to have a great compressor/limiter at the end of the chain. We also used Q extensively. We would have loved to have used RenVerb because it sounded absolutely beautiful, but it was way too big a hit on CPU.
How big a difference did you hear in quality of effects from native effects?
MO: Q gave us much better range and control for EQ than the standard stuff in the box. The Xbox never had an L1 type of DSP before so there's nothing for us to compare it with. We would try settings in Pro-Tools with the plug-ins and then duplicate them in the 360 and the sound was basically identical. I really wish we had had more time to see if we could have optimized the RenVerb.
Did you use off the shelf Waves licenses or special embedded system algorithms for more optimized realtime performance?
MO: We had engineers both here and at Waves to optimize the performance of these plug-ins. Waves was great to work with as a company.
What kind of CPU hit did you register, was there a battle over the CPU for realtime effects?
MO: Well, now you're getting too technical for me. Like I said, some stuff was too expensive and we had to minimize the hit as much as possible. This wasn't something that the Waves engineers were used to doing but once we met with them and showed them what we needed they really came through. As you know, in games there is a CPU battle for just about everything.
Were you able to crossfade effects (such as two kinds of reverb from one location to another)?
MO: We were already crossfading reverbs in our engine, so we thought we'd be doing that again, but since we didn't use Waves reverb it didn't come up. We used modified reverb from the XAudio stuff.
Have you heard any feedback from the team or the buying public on the difference the new effects have made in Halo 3?
MO: None of this stuff is revolutionary and basically I believe that all our progress is being made in many different subtle ways. The public seems to like the audio design of Halo 3 and Waves is one of the reasons why.
PS3 Contender: Uncharted: Drake's Fortune
Uncharted: Drake's Fortune was a landmark cinematic experience on the PS3, receiving an average of 90% on GameRankings, and is clocking in at well over one million units sold. Thanks to Gene Semel at Sony, we chatted with Jonathan Lanier (audio programmer) who answered these questions with the input of Bruce Swanson (audio director) at developer Naughty Dog.
What advantages did the PS3 provide for audio reproduction with Uncharted?
Jonathan Lanier: Several. First, the fact that the PS3 has HDMI 8-channel PCM outputs means that we could play all our audio in 5.1/7.1 on an HDMI system with no recompression, which sounds completely awesome. Second, for those without HDMI who must use bitstream audio, we had the ability to support DTS, which is very high fidelity.
Third, we are guaranteed that each PS3 has a hard drive, so we could dynamically cache important sounds and streams to the hard drive to guarantee full performance, even without requiring an installation. Fourth, the Blu-ray disc storage was immense, which meant that we did not have to reduce the sampling rate of our streaming audio or overcompress it, and we never ran out of space even given the massive amount of dialog in Uncharted (in multiple languages, no less).
Fifth, the power of the Cell meant that we had a lot of power to do as much audio codec and DSP as we needed to. Since all audio is synthesized in software on the PS3 with the Cell processor, there's really no limit to what can be done.
About how many simultaneous streams were used, and what techniques were used for transitions in the music?
JL: Up to 12 simultaneous streams were supported, of which 6 could be multichannel (stereo or 6-channel). Two multichannel streams were used for interactive music, each of which was 3-track stereo (i.e. 6-channel). A few additional multichannel streams were used for streaming 4-channel background sound effects. The remainder were used for streaming mono dialog and sound effects. All the streams were dynamically cached to the PS3's internal hard drive, which guaranteed smooth playback.
The music transitions were based on game events, such as changing tasks and/or completing tasks, as well as entering or exiting combat. Also, within a piece of music, we could dynamically mix the three stereo tracks in an interactive stream to change the music intensity based on the excitement level of the gameplay.
Were realtime effects employed?
JL: Uncharted is a fairly realistic soundscape, as opposed to a sci-fi game; so there's not much call for realtime effects. We did use a few, though. There was realtime radio futzing in a few places, when characters were conversing over walkie-talkies. We also had a tinnitus "ear ring" effect that would obscure the sounds while playing the ring to give that "you've almost been killed by a grenade" feeling.
There was a fairly subtle ducking system we used to get voices to play well over effects in certain extremely loud situations (i.e. multiple massive explosions); this was dynamic based on the current RMS power level. There was also a fairly extensive amount of unique reverbs for the different environments.
What was Uncharted's audio memory limit?
JL: The base audio memory budget was about 24MB; this included sound effect data, reverb buffers, and audio metadata. A few megabytes of additional memory was also required for streaming.
Was there any noticeable hit for decompression on ATRAC files?
JL: We did not use ATRAC, so the answer would be "no". We used a slightly modified version of the PS3's VAG codec. This worked well for several reasons. Decompression of this codec is basically almost free using the Cell SPU; we could decompress hundreds of these with no impact to game frame rate, and we never come anywhere near that in practice.
Another reason is that because we are caching all the streams on the hard drive, and because the Blu-ray disc is so large, we didn't have to compromise space versus performance. This means that our streams were relatively uncompressed, with no psychoacoustic artifacts, using a high sampling rate of around 48KHz. As a result, the fidelity of the resulting streaming audio was exceptionally good.
What was the biggest timesaver for the audio team in terms of tools and / or process?
JL: Without a doubt, the biggest timesaver was our technology that allows us to edit and reload all audio metadata and sounds on-the-fly during development, while the game is running. Any audio tweaks could be made almost instantaneously, usually without restarting the game. The ability to iterate as quickly as possible is undoubtedly the most important feature of our process.