VR Cinematic Storytelling, Part Two: Music & Sound Design

For our VR ghost story Séance: The Unquiet, our team set out to reinvent cinematic storytelling for virtual reality. In this second installment, we discuss dynamic, spatialized music and sound design in VR.

John Scott Tynes, Blogger

May 3, 2017

13 Min Read

In creating our new VR project Séance: The Unquiet, we set out to evolve cinematic storytelling for virtual reality to deliver something as close to the experience of watching a movie as we could but using the medium of VR to its best advantage.

For a hundred years filmmakers have crafted every shot to tell audiences where to look: from framing and composition to lighting and depth of field, every technique of film has been used to better guide the eye to the most important elements on the screen.

What we’ve done with Séance is to completely reverse that: we have crafted each scene dynamically to be aware of where the audience chooses to look. It’s a fundamental evolution in cinematic storytelling that requires the expertise and technologies of videogame development to pull off. (In our case, we used the Unreal game engine.) From a grandfather clock that sounds different when you look at it to an intricately designed crescendo of music and sound design that rises dynamically in pitch and volume with the angle of your head as you turn to discover a ghost behind you, Séance shapes its experience to the audience’s gaze. The result is a new kind of storytelling for a new kind of medium.

In this series of blog posts I'll summarize these and other techniques of cinematic storytelling we have evolved and utilized in our project. If you want to see some of them in action, we have released a free five-minute preview of Séance for the Oculus Rift and the HTC Vive.

Other Blogs in this Series

Part One: Composition

Part Three: Character Presence

Part Four: Anatomy of a Scene

Part Two: Music & Sound Design

The art of using music and sound design in cinematic storytelling is both well established and very advanced. The same is true in videogames, but in very different ways. Videogames have innovated dynamic and adaptive music and sound design, in which all audio responds to gameplay in realtime.

We wanted to bring these sophisticated videogame techniques to cinematic storytelling in virtual reality. In addition, we wanted to use the capabilities of realtime 3D game engines such as Unreal and audio tools such as FMOD to integrate audio and visuals in deeper ways. And finally, we wanted to tie this all together with the most important audience input we have in virtual reality: gaze.

Running Time and Nonlinearity

The running time of a film is the duration of the film and it's a fixed value. But the running time of Séance: The Unquiet is unpredictable -- there are scenes that go on for longer or shorter periods of time depending on what the audience is looking at and when. This means we cannot take a static approach to music and sound design and must instead use the tools of videogames to deliver cinematic audio.

The linear vs. nonlinear nature of movies and video games respectively has been the primary driving factor in how techniques for each have diverged. In a linear format, any given sound is heard only once and with hard-locked visuals to give it context. As such, that sound can be created in a manner to give maximum impact perhaps by increasing the bass to give it heft or adding an emotionally provocative element such as a blood-curdling scream or nails on chalkboard. This works well in linear media because the visuals give it context and the moment passes quickly leaving no time for the listener to dissect the sound or question its emotional impact.

In video games, sounds often trigger repeatedly either as the direct result of player actions or from emergent game circumstances. The sound designer cannot always rely on visual context or intentional brevity to reinforce the emotional impact of the sounds. A blood-curdling scream has emotional impact when used just once, but played repeatedly it quickly becomes tedious. Videogame audio tools such as FMOD overcome this challenge with highly advanced features for randomizing playback and introducing pitch and volume modulation as well as automated effects tied closely to game conditions.

The result is that unlike sound design for linear film, videogame sound design consists of breaking sounds into modular pieces along with instructions for how they should be assembled at runtime by the audio engine based on dynamic game conditions. This turns out to be incredibly useful in VR because it models closely how sounds function in real life, where we never hear the same sound twice: a slight turn of the head, a distant airplane, a swarm of small insects or thousands of other disruptions of the air serve to affect the volume and frequencies that reach the listener’s ears from moment to moment. Leveraging these techniques for VR allows us to build engaging and believable environments within a three-dimensional space.

Music & the Metronome

In film editing, there is a concept of "cutting on the beat" meaning you make edits aligned with the musical soundtrack. In VR there may not be any edits, at least not at the shot-to-shot level familiar from film. But using a game engine, we could achieve the same goal by aligning our action to the music dynamically.

Early in the project, our sound designer and composer Keith Sjoquist presented us with the idea of structuring our whole experience based not on minutes and seconds but instead on musical timekeeping. We selected an existing piece of music, Beethoven's Symphony No. 7 in A Major, 2nd Movement Allegretto, and licensed an orchestral recording from Current Music who are both easy to work with and affordable. Keith used FMOD, Pro Tools, and Adobe Audition to modify the recording so it would loop seamlessly within a specific sequence and then he noted the time signature changes within the music. Stuart Cunningham, who is both our technical and environment artist and also one of our programmers, then coded a metronome in Unreal that started the music playing and kept track of those time signature changes. This metronome function then broadcast an event on every beat which game logic could refer to and synchronize with.

We could tie any game event to the next tick or tock, meaning the odd or even beat in a measure. We could also insert pauses into game logic using beats instead of seconds. And as the time signature of the music changed, the pace of the metronome changed with it.

All of this meant that our game logic triggered consistently on the beat. If multiple events are getting queued up on the metronome, they will all fire simultaneously on the next beat even if they were triggered slightly apart. Since our music loops and our running time is variable, any given scene could potentially be presented in different time signatures on a given run-through but everything would still remain unified.

This approach established a fundamental pacing to the experience that was always tied to the music. Even when the music diminishes or is silent, the metronome maintains the beat and adjusts the time signature as needed. When the music comes back in, it is still in sync with the metronome and everything continues working together.

Using Audio to Get Attention

There are times when we aren't just reacting to the user's attention but actively encouraging them to look somewhere else. As discussed in part one, we have a scene where a black wolf walks across the patio and then sits to wait. The wolf will continue sitting there until the audience looks away, at which point we replace the wolf with a demon. To encourage the audience to look elsewhere, we play a supernatural knocking sound from another area of the scene. If the audience still doesn't look, we repeat the sound every now and then until they do. And if ultimately they just won't look, we eventually proceed forward with the story regardless and cover up the character transition with a visual effect. But in our experience testing Séance with hundreds of attendees at SXSW, they almost always looked at the sound.

At one point during the project we sat down and made a list of "sounds that make you look." The human voice tops the list but it also includes: sharp/loud sounds (like knocking), nonlinear sounds (like screaming), and sounds that are out of place (like a motorcycle in a garden).

Using Attention to Drive Audio

In the preview of Séance, we have a grandfather clock in the scene whose ticking is always audible. The ticking is aligned to the global metronome so it actually ticks faster or slower with the time signature of the music.

In addition, however, Keith placed a special effect on the audio. This only required FMOD to implement as FMOD's Unreal integration provides it with constant awareness of the rotational angle of the audience's headset. This means Keith always knows what angle left-right the audience is looking and can use that value dynamically in our music and audio.

Keith designed the tick-tock sounds to change when you look at the clock. They gain both volume and reverb. This sound design technique also ramps in and out with the angle of the audience's head, so it begins to mix into the normal tick-tock audio when the clock enters the frame and then gets more intense as you turn your head to look directly at the clock. As you look away, the mix ramps down again to the normal sound effects.

Spatialized Audio & Moving Emitters

We used the Oculus Spatializer plug-in for Unreal to help make our audio sound more positional relative to the headset, but Keith also used the movement of sound emitters in the scene to accentuate the experience.

Spatialization and HRTF

The Oculus Spatializer is one of many newly available software tools that work by attempting to replicate the way the human brain and ears work together in real life to identify the direction and consequence of any given sound. In general terms, the shape of a person’s head and how different frequencies are reflected or absorbed by the skull and reverberate within the skull along with the time offset between a sound reaching one ear before the other is part of a complex acoustic data set that the brain decodes to present the listener with an instinctive understanding of the location and direction of sound and the environment it is in. This acoustic phenomenon is known as HRTF -- Head Related Transfer Function -- and it has been studied for quite some time but VR is giving rise to a surge in new applications utilizing it. There is much more to the acoustical science of how we hear and the psychoacoustic science of how we interpret what we hear. Keith highly recommends Oculus’ white papers and Seth Horowitz’s excellent book, The Universal Sense: How Hearing Shapes The Mind for those who would like to explore the topic in greater detail.

At various times, we trigger a bolt of lightning in the distance with accompanying audio of thunder. Both the lightning and the thunder are separately synchronized with our metronome so the lightning flashes at the next beat and we can then delay the thunder for X beats before it plays, giving the impression of lightning striking closer or farther away.

But when the thunder does play, it plays with an emitter that travels across the scene. The initial crash of the thunder is outside in the distance, but the subsequent low rumble that follows it traverses the scene from left to right, mimicking the feel of sound traveling across a vast landscape.

Keith also used moving emitters for some of our supernatural ambient audio. He and Stuart set up a group of emitters that spin in a circle overhead. Their rotational speed can be driven by game logic. The result is a swirling vortex of creepy sounds happening overhead during some of our most intense moments. This is only really possible through the intensely spatialized audio capabilities of VR.

The sounds themselves were created to incite a feeling of unease or disquiet in the listener through pure sound design. We found through experimenting with the technique that we could create an unsettling feeling regardless of the audio content by simply speeding the rotation up to an extreme degree. At unreasonably high speeds, we could even make the listener nauseous purely through the use of moving audio -- no visual or camera movements needed! Our challenge then became to find the perfect combination of rotation speed and audio content to deliver the emotional impact we were targeting.

Spatialized Music

On our first VR project, The Impossible Travel Agency, Keith created a remarkable musical experience for our special Halloween version. We found the original orchestral sheet music to Rachmaninoff's symphony Isle of the Dead and Keith transcribed portions of it into a MIDI editor. He then created his own arrangement using a variety of digital instruments, some of which were musical and some of which were sound design elements from the Geosonics sound library, among others. Once he'd created his own arrangement, he then exported each instrument separately so we can give each one its own emitter in the scene. This allowed the harp to play from the sky above you, for example, while the bass drum was down in the earth below and other instruments were scattered across the landscape in an approximation of how the players in an orchestra are positioned. The result was breathtaking: a truly spatialized orchestral performance in virtual reality.

Room Reflections & Quadraphonic Sound Effects

To take further advantage of this spatialization, Keith utilized a set of acoustic impulse responses developed by Damian Murphy and Simon Shelley at the University of York's Department of Electronics and released under a Creative Commons license. These are basically data sets to simulate the acoustic properties of different environments, providing finely tailored reverberation profiles that FMOD can use to modify the playback of audio in realtime. Keith describes these as audio snapshots or photographs of an environment.

These kinds of dynamic acoustic profiles have been used in linear media and videogames for years through what is known as convolution reverbs. AudioEase’s Altiverb plugin is probably one of the most comprehensive examples of this tool. In VR when these profiles are combined with the room reflections defined and modeled by the spatializer plugin they are more powerful than ever. By adjusting them from scene to scene and moment to moment, Keith can shape the audio experience with a fine degree of control and emotional depth.

To further enhance the experience, Keith created preprocessed quadraphonic versions of key sound effects. He started with the basic stereo pair of the effect he'd created as the front left and right channels and then built reverberating versions of them to use as the rear channels. We then positioned these in Unreal as four discrete emitters equidistantly placed around the audience's headset so that those important moments have truly spatialized reverberations that are directly under Keith's control.

Conclusion

We have spent a substantial amount of time and attention on the audio and music design of Séance in order to deliver an experience that feels like a movie but uses the dynamic audio techniques of videogames. The result is a rich and rewarding audio experience that is responsive, fluid, and emotionally affecting.

In the next part of this series, we will discuss the use of characters in Séance including performance capture, character head tracking, and avoiding the uncanny valley.

About the Author(s)

John Scott Tynes

Blogger

See more from John Scott Tynes

Related Topics

Related Topics

Recent in More

Related Topics

Related Topics

VR Cinematic Storytelling, Part Two: Music & Sound Design

Part Two: Music & Sound Design

Running Time and Nonlinearity

Music & the Metronome

Using Audio to Get Attention

Using Attention to Drive Audio

Spatialized Audio & Moving Emitters

Room Reflections & Quadraphonic Sound Effects

Conclusion

About the Author(s)

Latest News

Trending

Featured Blogs