In this reprint from the final (June/July 2013) issue of Game Developer magazine, sound designer Damian Kastbauer shares this excerpt from the Oxford Handbook of Interactive Audio (due 2014), in which he blends narrative and design notes to craft an inspirational future vision for interactive audio.
Milestone 1: Power On
I'm standing on the bank of a river. It's late, and the heat of the afternoon sun is fading. The sounds of cicadas are all around me. Their placid chirruping, accompanied by the nearby burbling of a stream, sets a tone of peaceful tranquility. The simulation here is really good.
As I kneel next to the shore, splashing water on my face, the sound erupts in a perfectly modeled and synchronized symphony of harmonic fluids. Each drop of water from my fingertips that ripples on the surface is reflected in automatic sound synthesis and tied so tightly to the visuals that I'm struck by the fact that it was just a short time ago that we were locked into a sample-based methodology. It didn't happen overnight, but as I am standing on the shore of this digital river, it seems clear to me that audio technology has finally made good on the promises of transparent interactivity.
It should be said that I'm inputting this log in real-time, while working inside the beta release of our current simulation based on a historical re-creation of Earth circa 2012. The simulation has been developed using an industry- standard authoring application by specialists spread out across the galaxy. I'm currently reviewing the work done by our technical sound synthesis specialists who have recently updated some old sound models. It seems like ages ago that we began working with our library of reference materials to synthesize different aspects of Earth's soundscape into the living, breathing, and sounding world that is represented today.
Milestone 2: Soft Focus
I remain lost in thought as the sound of rushing water catches my memory. My mind is transported to a sunny day from my past. A reunion has brought family members together within a branch of an older beta version of a simulation located by a nearby stream in the countryside of Earth 2012. It was during this time in simulation technology that our industry was just beginning to iron out inconsistencies inherent within the burgeoning field of procedural audio, synthesis, and the advanced manipulation of dynamic sound: baby steps toward the expansive fully realized simulation I'm testing today.
As we laugh and carry on inside the memory of my mind's eye, the children play and chase butterflies along the edge of the rushing water. From the corner of my eye I can see my young daughter rushing in and out of cattails twice her height with her favorite doll. Her look of freedom and wild abandon while chased by cousins as the fronds whoosh back and forth brings a smile to my face. I move to speak and urge her to stay clear of the undulating, black, and treacherous stream edge, but they are gone before the words leave my lips.
I become lost in debate with various relations over the use of sound as a storytelling device. It seems we have finally found a way to effectively use all five senses to convey an emotional arc as part of an interactive experience. Of course, there continue to be new ways to channel and leverage our history through simulations, new ways of combining what has already happened with new technologies as a way of making sense of the future. Through the lens of creativity, we may finally understand who we are and where our civilization is headed.
Back at the reunion, the sun is beginning to set in the distance, and it's time for us to take leave of this place. I head toward the pack of children collapsed by the riverside in search of my daughter. My steps echo coldly in the grass. Her smiling face goes unseen, so I ask the gathered assortment of nieces and nephews where she might be. No one has seen her for some time, they say.
Milestone 3: Debugging Memories
In a moment, I feel my every fiber stiffen toward a heightened awareness of my surroundings. Nothing has changed, and yet in that instant it's as if all sound had been removed but for the deep churning of water. I look around frantically, my eyes darting between reeds as I began calling her name. People are gathered around me, I can see their mouths moving but can no longer hear their questions. My mind is sharply tuned to the sound of dramatically frothing whitecaps as I desperately attempt to keep my thoughts above water.
I wade into the sea of cattails, moving further and further upstream. Their fronds brush past me in a noiseless hush of extreme focus. I can hear others who have crossed a nearby bridge and begun combing the riverbank on the other side. My voice, becoming louder and more urgent, punctuating the ebb and flow of water to my left, sounds far away. A curious rocky outcropping comes into view, as the taste of saltwater reaches my lips. I see my daughter playing noiselessly with her doll in the shade of a dark black stone column. I call to her, but she doesn't look up. I quickly close the gap between us and suddenly recognize that the sound of the stream has disappeared.
My daughter becomes startled by the sound of my voice calling her name. She looks up as I rush to her and hold her in a deep embrace, my footsteps resonating with an alien quality. The sound, clothing brushing against her white dress, explodes within the vacuum of soundlessness I find myself enveloped in. I look into my daughter's eyes as she holds out a ring of flower stems that she's woven together. She looks at me and smiles. When asked by friends later she'll say, "I wasn't lost, I just found a quiet place to play." We emerge from the noiseless interstitial space, call off the search party, returning to the sound of the world around us.
These early simulations weren't perfect, but the feelings generated by them resonate just the same. The difficulty of authoring propagation for every explicit sound once meant that the process was subject to gaps in the handwoven audio fabric. Every inch was "painted" by hand with sounds. It's easy to see, looking back on the technology, how these "dead spots" could exist—these places where no extraneous sounds would reach. Thankfully, people continued to work toward accurately representing sound propagation within the simulation . These advances were, over time, able to leapfrog the manual process that had been bogging down the process . I have to say that the radical sound mix change brought on by my alerted state operated perfectly, in retrospect. It's hard to believe the leaps and bounds that technology has achieved, even during my short life span— especially as I have found myself deeply embedded in driving the modern state of the art in our re-creation of Earth.
Milestone 4: Audio Beta
I return to testing inside the current-day simulation. I stand up from splashing water on my face and dry my hands on coarse cotton pants. This elicits a similar soft explosion of sound as on that day so long ago. I'm reminded of how long it took us to get the subtlety of movement and material interaction to accurately reflect the complex dance of physical sound .Tying into the cloth simulation was the easy part. Once we had all of that data on our hands, it was purely a process of sculpture (that is, mostly subtractive). Sound models became more complex still with the addition of liquids and temperature... but at the end of the day, our flexible synthesis models stood up to the barrage of parameterization from each of the systems.
I step back onto the path and continue on to a nearby watermill. I'm reminded by each footfall how long it took to get these footsteps right. Everything is now handled inherently by the procedural models in place, which take all sorts of factors into account: mass, velocity, materials, viscosity, and a variety of other things. What began as a purely academic pursuit swiftly blossomed into a creative endeavor thanks to some early front-runners in the field. Once the underlying mathematics were in place, the tools quickly followed. This development allowed early designers of sound to move beyond hard science running behind the scenes. Enabled by smart and creative authoring applications, the sound of footsteps and movement was able to transcend the synthetic quality of early experiments and emerge as a realistic representation.
As I approach, I note the churning sound of water on the undershot-wheel paddles grows louder with each step. I'm struck by the quality of dynamic frequency filtering over distance. The simple 3D point sourced, volume- only attenuation curves of yesteryear have long ago been replaced by a complex matrix of real-time distance modeling, taking into account everything from air density to temperature and humidity. The spinning wheel splashes on effortlessly while some unseen turbine labors to convert the power of nature into energy. Of course, this one is just for show. However, it wouldn't surprise me to find that it powers all of the lights inside. In such a complex woven fabric of interaction, you can usually count on everything being designed with a purpose.
I walk to the door of the water mill and reach for the handle with trepidation; the last time I tested it, it didn't accurately convey the weight and material type, so I sent the sound designer back to the drawing board. Turning the knob I hear the rusty mechanism come to life with a sharp snap of metal, hollow-wooden resonance, and deep unsettling creak. Each individual aspect of the sound takes into account the compositional breakdown occurring: damp location (as evidenced by the deep rust flaking off each hinge), wood density and thickness, and the age and tenacity of the steel. The door closes with a deep thud that reverberates as I let it close behind me . That was much better, I think, as I send off a friendly note to the designer who is a million miles away. Within moments I've received a response which reads like a sigh of relief.
With so much to do, it's hard to get caught up on the little things. Thankfully, the work we've been doing on the underlying physical modeling propagates throughout the entire world. Imagine if we had to place sounds by hand for every river or door; the worlds we're creating are just too large for such a labor-intensive process. While we do try to take a handcrafted approach to sounds that require something unique, the expectation that every nook and cranny sound convincingly Earth-like in this simulation involves more than just a few sound files spread across the face of the planet.
Milestone 5: Stress Test
Back inside, there is the pervasive, low-end rumbling sound of rushing water coupled with an oppressive clattering. I'm standing in front of a succession of wooden gears locked together in a spiraling groove of perpetual motion. The hollow "thonking" sound they make as each tooth locks into place is a direct result of the latest wood models developed by the foremost modal synthesis historians. Piggybacking on their research into old technologies and material types has given us a satisfying level of detail to the rich characteristic of wood. With their research in place, we were able to further embellish the machinery using several creative tools running within the simulation.
The whole contraption is shuddering with an unbelievable rumble that I can feel in my gut. With a gesture, I engage the authoring tools, and an interface for interacting with the world appears in front of me. As it springs to life, ready for action, I quickly navigate the controls and begin slowing down the water flow outside in order to hear the change in sound.
As the internal mechanism begins to wind down, there are no artifacts in either the pitch or timbre as the gears begin slowing to a halt—this isn't a simple sound file representation. You see, the only remaining samples of an actual watermill were recorded at the low sample rate of 192 kHz/24 bit, but we were able to use feature extraction across a diverse sample set and mine relevant data from these recordings and use it to inform the creative application of various processes and models. These samples were critical, since we had never seen a working watermill in person, and ended up affecting the overall sound presentation.
As things grind to a halt, I notice a gentle whistling sound finding its way through the cracks in the thatched roof overhead. Wind was the first and easiest dynamic synthesis we could apply to these simulations—we could apply pitched and filtered noise of different colors in combination with reflectors and deflectors, both abstracted within authoring toolsets and programmatically based on geometric representations within the different environments. This technology was very futuristic at the time, and what it lacked in "natural" sound factor, it made up for in its ability to be modified in real time, using parameters from the simulations. As the technology progressed, the randomness of nature swiftly took the place of consistently sloping pitch curves and unreal-sounding representations.
My footsteps echo convincingly on the hollow wooden floor as I resume my test and calibration procedures. I find the watermill synthesis and modeling holding up well under these extreme circumstances. I stop my testing just short of flooding the entire mill in order to listen to the resulting forces of nature on the tiny primitive structure. Safe within the confines of the simulation, the properties of the debug shard that instantiated when I enabled the authoring tools gave me a unique perspective on the resulting mayhem. Distanced from the confines of the simulated physical world within this unique instance, I'm free to run amok. I'll save the creative joy of destruction for another day. I start the river gently flowing again and exit the tiny building and continue my testing.
Milestone 6: Content Complete
Outside, the sky has blossomed into a majestic eruption of purple and pink at dusk. The late afternoon cicadas have all been replaced with a chorus of crickets and occasional bird chatter. It's been so long since we were restricted to a single looping audio file that I don't even notice how diverse the soundscape is. While there continues to be a place for field recording as the basis for building a library of granular models and anomalies, it's no longer possible to capture these sounds in their natural habitat. These sounds may have existed in nature a long time ago, but that time has since past. Luckily we have access to a wide variety of artifacts from the 21st century, including field notes, recordings, and skeletons.
Some of the environments initially proved difficult to re-create. Isolating elements amidst the noise of then- modern culture proved easy enough at first. When it became more difficult was when we began to note peculiar behavioral changes between recordings taken from different decades in many of the vocalizing birds and animals. As the rise of industrialized society began to take hold, so too did the sound of a new era of machinery and technology. These sounds become an inexorable part of the complex auditory fabric of the earth and, over time, completely modified the speech and frequency range of vocalizations wherever nature overlapped with industrialized society. The difficult parts then became understanding the complex interaction that developed over time, and finding ways to realistically represent the ensuing soundscape. The result is a blending of both natural and manufactured sounds in a complex cacophony quite unlike anything heard since.
I watch the sun slowly slipping behind the rolling hills off in the distance and I'm struck by the true beauty that the 2012 Earth embodies in our simulation. Everything is represented with a rhythm that resonates throughout, from the tall grass gently swaying in the breeze, to the water wheel working in ceaseless syncopation. It seems that in this moment, there was a balance between the elemental forces at work and the swiftly encroaching hands of progress. It's impossible not to judge the years that have transpired in the interim as tragic when faced with such beauty. We all hope that this experience can serve as a future roadmap for how to proceed as a society, now that the damage has been done.
Milestone 7: Bug Crunching
Back to work, I quickly navigate a user-interface terminal and instantiate a new area of the world to test in. A transition opens soundlessly in front of me and floods my senses with the sound of the city. I'm overwhelmed by the density of the experience: all oppression and intensity as drones weave their way in and out of a sympathetic embrace. This humming metropolis is a hive, a thrumming and humming frenetic audio vibration that typified 2012 Earth in all its glory. In a moment: speeding cars, trickling fountains, and skyscrapers resonating in an orchestration of the then-modern age.
Between the insistent air-conditioner rattle and intermittent elevated train clatter lies the golden age of information technology, an endless stream of activity amidst the rapid acceleration of interconnectivity. It's not long, as I cross the threshold of modernity, before I'm confronted with the endless saturation of sensual information. These simulated molecules caress my battered brain-stem in a dance of orchestrated input toward an inevitable overflow of stimulation. Whereas moments ago I was adrift in the tranquility of the rural countryside, I'm now firmly frenetic in the no-hold-back- all-attack modern age.
I engage a frequency analyzer that immediately projects a rainbow of visual analysis across every corner of the world. I can use this debug model to visualize the density of the frequency spectrum based on color as it emanates from every object, radiating a kind of ethereal cloud of color based on the sounds of the city. As cars speed by I see their unique voicing characteristics reflected in a wash of frequency analysis, the hue and saturation accentuating a buildup of tonality across the spectrum. I'm looking for an anomaly reported by early adopters of this simulation, some sort of unnatural resonance.
Loading a restore point, the world around me snaps into its prerecorded routine. It must have been a familiar scene: buildings reaching to the sky, transportation of all kinds treading familiar paths, a swarm of people vibrating with energy. Vehicles zipping around like oversized insects buzzing with momentum, intent on reaching their destination through a complex mechanical ballet of belching exhaust and the endless drone of so-and- so revolutions per minute. Each building resonating with a hum. From the depths of their mechanical rooms to the vibrations of people cascading throughout the interior, each building is like an oscillator in this polyphonic synthesizer of a city.
Milestone 8: Anomaly
It's the same on the sidewalks that flank every thoroughfare. Everybody on the street moving, seemingly in sync, toward an unseen destination. Each person moves lockstep in time with the heartbeat of the city; like a million tracks of kick drum mixed below the threshold of hearing, but undeniably propelling each step. The individuality of their concrete footfalls is lost in a sea of clear density.
Aside from the occasional amplitude increase, so far, the playback seems normal. I scrub through the capture to the moment specified in the report and all at once the world is saturated in a haze of pink, representing a frequency pileup. I switch to manual control and reverse to the origin of the anomaly. I peruse the data streaming in one of the debug windows in an attempt to isolate the source and immediately see the problem. I navigate to the source; it seems at first glance to be a discrepancy in the modeling. As I trace along a sinuous pulsating line that connects the extreme data- point to a mass of overlapping frequencies, I find myself standing low to the ground somewhere in the middle of the sidewalk.
I switch off the frequency analyzer and find myself face to face with a little girl sobbing uncontrollably. This explains it—the quality of our voice modeling always seems to break down during moments of extreme emotional response. Our models take into account the simulated size, shape, and composition of the simulated vocal chords and we've captured and re- created the entire range of motion, but there are extremes in every case that still require fine-tuning.
Milestone 9: Synthesized Soul
Of course we began by nailing down celebrity vocalizations first: These were the most profitable. As soon as an actress or actor achieved a certain status we rushed in to take a vocal snapshot which could be perpetuated beyond the peak of their profession. Coupled with a full body scan, procedural animations, and phoneme-based lip-syncing, we've had digital doubles as part of our experiences for a long time. There still exist moments when even our best technology cannot yet achieve the artistry present within a live body, though. At the end of the day it is still the performance, whether acted or authored, that has the potential to connect with the audience. It remains a testament to the human body- instrument that there are still secrets held within.
The girl seems to have quieted down and her face begins to scan the crowd frantically. My heart goes out to her as I remember that day so long ago when, as a parent, I lost someone dear to me, if only for a moment. Of course this is still only a simulation; none of this is happening, but I'm still here having a real reaction that I can feel in my chest as my breathing accelerates. It's more a result of my experience than any belief in this as "real." Regardless, I kneel down on the sidewalk in front of her and ask if she is lost. She looks in my face not knowing exactly what the answer is, but eventually nodding in acknowledgment. I begin to ask who she was with when, midsentence, the message is broken by a mother quickly lofting the little girl into her arms. Amidst a volley of hugs and exasperated condolences she's quickly whisked away through the bustling parade of lunchtime foot-traffic. I can see the little girl's smile through the sea of people as I return to the task at hand.
Scrubbing back in time to the original anomaly, now identified as the little girl's cries, I utilize the controls attached to the data-point for the girl's vocalizations. Soloing the sound of her voice, I begin to apply various smoothing algorithms. Without losing any of the ferocity or emotion, I methodically constrain the parameters that have caused things to become unnatural. The process takes just seconds, but in a simulation of this scope it could take hours to address every similar case. Instead, I apply the adjustments to the vocalization models to be used in the event of other such anomalies. While the changes don't go as deep as the simulation itself, they can be applied in real-time if the same behavior is found. If it happens often enough, the solution will be used to inform the next update to the vocalization model.
This period of Earth has become known as a "great turning point" in the evolution of life on the planet. People began to notice the changing environment. Even amidst the abstracted nature of the city, people were taking note of the fact that everyone had a role to play in preserving the planet. By the time this recognition spread, the focus had already shifted to worlds beyond the confines of a single planet. From there, it was a combination of exodus and slow decay.
I exhale and return to the task at hand, flipping to the next restore point that needs investigating. I'm faced with a wall of water, an undulating sea, and wind set to tear the roofs off the nearby houses. The howl and moan as the waves increase their amplitude approaching shore is unhinged in a moment of sheer sonic terror. I bring up my display, and prepare to orchestrate the power of nature.
Milestone 10: Ship It
Through a combination of fictional story and factual reference the idea of this story is to inspire the work being done to help envision the way forward for interactive audio. While some of these workflows and methodologies live strictly in the realm of science fiction, there are aspects that can be found running in simulations at universities today. This radical change in approach— from the standard sample playback methodology into a composite toolbox which incorporates extensible procedural, synthesis, and physical modeling techniques— is rapidly evolving within today's game industry toward a future hybrid model of dynamic sound. Due to increasing size, diversity, and complexity inherent in most games, it seems inevitable that sound needs cannot be met with sample-based content alone. I continue to look forward to the creative ways that sound can reinforce the perceived reality of an interactive experience by leveraging the inherent dynamism of simulations.
"Life's like a movie, make your own ending, keep believing, keep pretending." - The Muppets
- Noriko Kurachi -- Now Hear This
- Sounding Liquids -- Sound Synthesis from Fluid Simulation
- Efficient Numerical Simulation of Sound Propagation
- Precomputed Wave Simulation for Real-Time Sound Propagation of Dynamic Sources in Complex Scenes
- Motion-driven Concatenative Synthesis of Cloth Sounds
- Precomputed Acoustic Transfer: Output-sensitive, accurate sound generation for geometrically complex vibration sources
- RESound: Interactive Sound Rendering for Dynamic Virtual Environments
- Fast Modal Sounds with Scalable Frequency-Domain Synthesis
- Walter Murch, Dense Clarity –- Clear Density