This is a love song. A love song to video game music. A love song to video game music that spends a lot of time pointing out that video game music would do well to iron its shirt, shower every day, and would it kill it to maybe shave every once in a while?
This piece is directed toward those who make, compose for, and/or enjoy a cinematic game experience common to most triple-A and an increasing number of indie titles. It touches on elements common to all video games in many places, but the purpose is not to play the nagging Jewish mother to two-man developers about how they should be more like their big brother who graduated summa cum laude and landed a big contract with Activision and will probably cure cancer someday.
The purpose is to help producers communicate with their composers, help composers hone their craft, and help the end consumer become more educated about the potential value of game music.
Why Take a Cinematic Approach to Game Music?
Too long has video game music been relegated to a dusty corner of gamers' minds. Sure, we all have fond memories of chip-tunes and our favorite melodies, but video game music has typically been viewed as a background soundtrack, not something that plays directly into the visual elements. Just look at all the games that allow you to import or stream your own music while you play.
This is a shame. Music can have a tremendous impact on the mood, feel, and emotion of any visual elements a game can try to convey. A shift in the music can take the exact same visual scene in two completely different directions. (I've always liked this example to show how a different score can change things up:)
Video games come in many forms and serve many purposes as far as the type of entertainment -- Ninja Gaiden in hard mode clearly scratches a different itch than FarmVille -- but I think it is safe to say that the majority of triple-A and otherwise popular games are trying to take a more cinematic, story-focused approach. What was the last FPS you played that didn't have a story component, regardless of how preposterous the premise? The visual techniques reflect this -- effects that emulate real camera patterns like light bloom, lens flare, focal shift and even film grit are all very common in the modern game.
Video games are unique to this A/V field in a number of ways -- one of the most obvious being that the pacing and even the order of events can be dictated by the player. Writing for this sort of uncertainty definitely present problems that any video game developer needs to consider. However, as games become more scripted, planned, and emotionally impactful, game composers would do well to study the centuries of experience other mediums can provide them. Re-inventing the wheel is not something we want to do here.
The focus on cinematic visuals and storytelling becomes increasingly obvious as we look into just how much straight-up non-interactive cinematic storytelling can be found in games. Oh sure, there might be a "press X to not die" moment sprinkled here or there, but when you strip out the real gameplay you are often left with a long sequence of cut-scenes that rivals the length of major movies.
For instance, The Batman: Arkham City cutscene playlist on YouTube is just north of 2 hours and 30 minutes long, longer than the majority of motion pictures. Gears of War 3 is 1:43 in duration. Xenoblade Chronicles? North of five hours, beating even the extended edition of Return of the King in length. Even completely disregarding player-driven gameplay, there are entire movies contained inside today's games.
Unfortunately, video game developers and the players themselves don't often see this connection. Corners are cut, sacrifices made, flat-out wrong practices are repeated time and again, and the gaming media looks upon it and proclaims it good. Games have made great strides lately with a more cinematic approach to storytelling, but it's sad to see a crucial piece of that puzzle so often neglected. The Final Fantasy series, Dragon Age, and Mass Effect have managed to start to understand lighting, blocking, cinematography, and the like, and utilized them to great effect -- but what about the music?
All visual media is more closely related than some would think. Film, TV, advertising, and games all share many similar traits, and music publishers often treat them in a very similar way. Though each presents unique advantages and challenges, all can be summed up with two simple, tiny words:
This is the essence of all visually-oriented music. Video games have long been a valid medium for telling an intriguing story, and the "to picture" approach has been proven over the centuries to be the best companion, as such. Our reaction to the music is often more subconscious and deeper than our visual analysis. At best it enhances and deepens our understanding of what our eyes tell us -- sometimes directly adding, sometimes showing another facet or wrinkle that we didn't see.
With all the cinematic focus on visual elements, why wouldn't we take a cinematic approach to game music?
Before we discuss using musically nerdy cerebral philosophies to guide game scoring, perhaps a quick overview of some basic techniques are in order. Frankly, many games fail to get even these right. The essential problem is that you can't just write music and expect it to work.
Understanding Your Place
Our ears are specifically tuned to speech frequencies, and working around that can be difficult. Guess where melodies (and music in general) sound best to our ears? That's right -- the exact same range as speech. Think about the last time you were trying to hold a conversation when you had the radio tuned to a pop station. Did you notice how much you had to turn it down to be able to hear the other person? Now try talking with about the first 90 seconds of this on in the background:
This piece was scored specifically written to accommodate human voice. Can you keep the volume much louder than the pop music example? You should be able to quite well. It's about space.
For the purposes of visual media discussion, diegesis is anything that is directly represented on-screen as "of the world". So if there's a scene in a smoky cabaret and the music of the scene is being played by a jazz band contained therein, that's diegetic. "Bohemian Rhapsody" is diegetic within Wayne's World, as the characters are obviously aware that the music is coming out of their radio and are reacting to it. (This article talks a bit more about that and other important concepts.) Most all film and video game music is mimetic, not diegetic, meaning that it's not music that is in the world with the characters but has instead been added for the sake of the audience. It's important to understand why that matters: because diegesis is king.
If you ever study opera you will quickly see that the entire orchestra basically exists to support the singer(s). The vocalist is diegetic, or in the world, and must be skirted around carefully by the music, which is mimetic, or outside of the world. Therefore, any composer worth their salt must write around the diegetic part of the story because that's the part that's actually telling the story. In more traditional musical settings like opera this is quite easy to accomplish, as the melody lines are clearly written out in musical notation. In other mediums, it may not be as obvious. This doesn't mean that the concept can be ignored, however.
Even simple conversations have pitch. Great stage actors can have up to a three-octave speaking range; it is how emotion is carried through the voice. Try speaking in a monotone and see how much you are able to convey. Erich Korngold, one of the great early composers of film, was famous for writing diegetic film dialogue out in musical notation and then scoring around that, as he did earlier with opera librettos. While this may not be a necessary step, a basic understanding of the frequency ranges used by male and female speakers and how to avoid writing scores in the same range is not out of the realm of any composer's understanding. As an example, take this clip from The Adventures of Robin Hood, one of Korngold's most famous scores.
Notice that the instrumentation takes up the entire spectrum of sound at the beginning, but then right as the vocals enter at 18 seconds in, they part. The strings become higher, the bass gets a bit lower, and everything that was in between the two drops out. It's a virtual parting of the waters to make room for the voices in their proper register. Just to show it's not a fluke, it does the same thing at 0:43.
I can't think of a single game that really nails this concept, which surprises me. It's not necessarily difficult, one just has to be aware of it. It's sad commentary that the first thing I typically do when I load up a video game is turn the "voice" slider to maximum and the SFX and music sliders down considerably, because they have no concept of how to write and mix around the vocals, instead of barreling over them.
In addition to keeping the frequency range in mind, the composer must also consider other ways they can muddy the text, and avoid them. In the gaming world this most often manifests itself as a score that is so busy it's distracting. Too many notes, too fast a tempo, too much of everything. An expert composer can properly choose the time for the score to become prominent and the time for it to fade into the background, back and forth between the natural breaths of the narrative.
An excellent example of this comes from the trailer for Conan -- a film that was rather terrible from a cinematography and plot standpoint, but had an absolutely outstanding score composed by Basil Poledouris. Watch the whole thing:
Notice how Poledouris actually alters the music for the lines of text? And it works even on its own as a song? Pull out the high strings and choir, throw in some low brass to punch up James Earl Jones' dialogue, it works very well. The actual instrumentation lends itself to the interplay of dialogue and action scenes.
Now for the ugly side of that coin. Sometimes people opt for the lazy way out and dump this on the mixing engineer, who accomplishes such by "ducking" the music when a vocal track is present -- ducking being pulling back the music volume to make room for the vocal to be heard. An example of constant ducking is, well, a lot of trailers, as they tend to have busier, more "intense" music. We'll use Uncharted 3 as a recent example:
Starting at about 40 seconds into the launch trailer, notice the up-down-up-down-up-down as the speech pops in and out. It's distracting and annoying.
Creating audio space is like arranging a bunch of 3D bubbles. Whatever is in the center on the X- and Y-axes and at the front of the Z (depth) axis is going to grab the most attention, and that should always be the vocals. Ducking is a cheap way of pushing the music back on the Z-axis, but that constant shifting is noticeable; the much preferable method is to make space on the front plane of X and Y around the dialogue. It is quite possible to write around dialogue, as thousands upon thousands of hours of Opera and film scoring will attest. Why shouldn't video games do this, as well?
Just Beat It
A beat, in scoring terms, refers to a particular visual point of action that should be accented. This can be a hard cut in the footage, a punch, just about anything. This can be accented in the music in many ways, typically depending on the requirements of the visuals. Here's a quick example that runs the gamut:
The low percussion (likely an udu) as the dragonfly lands on her nose is a beat. The sitar note as it flies away is another. The harp gliss for the reveal of Wonderland is another. The cymbal roll for discovering the caterpillar is yet another; the harsh low brass almost immediately after as his expression sours another still. 25 seconds in and we've hit five beats already. This is a fairly common pace for higher-energy sections and trailers. Notice that each of these has a different effect, but all come together to add interest and impact to what's happening on screen.
It is possible to create music for a beat-heavy visual without using beats, but then it's up to the foley and sound designers to pick up your slack. See here for a quintessential example:
The opposite effect, hard transitions and visual beats without any aural punch at all, feels so unnatural that I can't even find good examples of it, because no one does it.
I've used the example of a trailer here because this is something not really seen in games much, despite having many cut-scenes proliferate in the modern game. Occasionally one can find a use of a single beat, say a cue that build to a big crescendo, but considering many scenes can have potential beats that number well into the double-digits it is a woefully under-utilized idea.
Painting the Picture
The most important thing the score is there to provide is an enhancement and complement to the visual cues on-screen. This means, as an example, that bombastic brass in the middle of a tender love scene would not be particularly appropriate.
A good score can also act as another tool for the director (or game designer), adding subtext that may not otherwise be available, or shifting the perspective of a scene. The latter is a very important concept, but not necessarily appropriate for all situations.
Most movies, games, etc. are pretty straightforward, and usually demand a straightforward accompaniment. Nevertheless, it is important enough that we will discuss it in-depth later.
In the last 25 years or so, films and television have discovered that a score doesn't always have to be "music" in the traditional sense that there's a melody, harmony, steady time signature, and so forth.
Sometimes a simple texture can get the point across as well or better than a more formally structured piece. Sometimes the music's lack of structure can support the picture's theme of amorphousness.
Think back on the shows of the '70s and '80s. These were written so that when the show started, people all over the house would hear the first few notes and think, "Bonanza's on! Time to gather 'round the TV!" You're probably humming one of the theme songs right now. Eventually, we transitioned to this:
This is not an inherently bad or good thing, but another tool in the composer's toolbox they can use to properly paint the scene. The "pad", or group of low strings/synths sustaining one or a handful of notes out of time below a scene, is extremely common now in movies to build suspense or otherwise denote a holding pattern, so to speak. At any rate, we now have music, textures, and sounds to play with, which opens things up quite a bit.
How to Represent
As noted in the preceding paragraph, it is not enough to say, "write something punchy here" or "make this sad". You have to be attentive to far more detail than just that. The best way to show this is by example. To show that I don't hate everything that you hold dear, we'll take a cue from Howard Shore's score to The Fellowship of the Ring. Now, concerning hobbits...
What is the music trying to portray here, and how does it accomplish it? Note that for the first minute, we see a hint of things to come. The camera sweeps through Bilbo's house, over things that are to be very important in a very short while (like the fireplace). During this you actually hear a low and incomplete version of the Fellowship Theme, an introduction to the extremely important concept of leitmotifs that we'll get to later. So the music is foreshadowing for us, in support of the video that does the same. Then Bilbo begins to discuss hobbits.
Hobbits are simple. They love peace, food, and good tilled earth. The music has to represent all of these things, as well as their general lazy, jovial nature. Shore chooses to go purely with a string section here to keep things simple, as a complicated instrumentation would belie the point. A fiddle (or a solo violinist playing one) is chosen as the lead instrument. Shore relies on prior knowledge here to stress the theme -- another helpful concept. Most every human who is culturally involved associates a single fiddle with a simple, rural, pastoral setting. Whether it be Ireland or the American South -- or what have you -- the sound instantly conjures up those memories. So half of his work is done for him before he ever puts notes to page.
The music itself is short and punctuated. Somewhat lively, but not too fast. Were it elongated, with sweeping strings, it would feel too sappy and lovey-dovey; were it much livelier (e.g. faster) it would feel manic, and clash with the rather lazy nature of the Shire's inhabitants. Note also that the register of the violin is nearly two octaves above where Sir Ian is speaking, which ties into our earlier discussion about avoiding the diegesis.
There are ideas in this short little piece that I don't have time to discuss, and I'm pretty sure that there are ideas I don't even hear but that we all may absorb sub-consciously. Howard Shore may be the best film composer currently practicing, so I wouldn't put it past him.
Let's listen to another example, but I want you to listen to the song, without visual accompaniment, and think about what it's portraying:
What do you hear? There's tenderness, but there's tension. Even in the first couple of notes, there's what I would call "consonant dissonance", that is, dissonance that we're pretty used to hearing in the form of a lot of upper-tertian harmonies, major 7ths and minor 9ths and so on. Pretty, but full of tension. And then that trumpet comes in. Oh goodness, that trumpet. Uan Rasey, the storied trumpeter who plays that part, was famously told by Jerry Goldsmith (the composer) to "play it sexy. But not like it's good sex!" A fitting portrayal of Chinatown, and the level of granularity that needs to be thought of in games as well.
Okay, one more. If you can, try to avoid looking at the title on this one! Just click the play button, close your eyes and tell me what you picture:
What do you hear? I hear a lot of energy, but of a very intense nature. Likely an action sequence. A lot of industrial sounds, samples of things like banging on metal drums and the like, but mixed with strings that have some Middle Eastern flair (like the little trills they do) and later on, a pan flute of all things. Okay, now look at the title -- this is the music that plays after you've completed a main assassination, and have to high-tail it back to a brotherhood base. Does it make sense to you?
I'm a bit on the fence about this one. In one sense, I get the combination of the old-world themes with the modern sound, since we're talking about taking highly technological equipment being used to send a person back to the 1100s. I like that. It also does good job of sending a sense of urgency to the player to get out of the area before major trouble ensues.
On the other hand, the very industrial, mechanical feel of the modern elements screams steampunk/industry to me, not the cybernetic technology and level of espionage at play in Assassin's Creed. Additionally, the slow, methodical, clunky battles with foot soldiers that you'll likely get into during your flight (though you can avoid them, it's difficult when they congregate directly outside the base) are an odd match. That, however, is more the fault of the game than the score, but designers must constantly think of how the two fit together. The score is not there to distract the ears while the eyes play the game.
As an aside, most of what we think of as instruments weren't invented until the 1800s at the earliest (including the piano). Rolling backward, we had things like crumhorns and sackbuts, and farther back you are restricted to gut-string guitar, lute, harp/lyre, and a basic woodwinds like a pan flute.
So pre-Renaissance settings and Bronze Age fantasy games that use heavy brass or even electric instruments are being very anachronistic. This often feels weird, even if you can't put your finger on precisely why. (300, I'm looking at you.) Assassin's Creed has a get-out-of-jail-free card due to ostensibly being set in the modern day, but it would be nice to see them pay more attention to this fact.
Video games typically have long segments of player-controlled action that require atmospheric background music on endless loop. This is certainly something that's unique to video games, and presents its own difficulties. The composer must create audio that (a) tells the story of the area or level, (b) has enough interest to avoid fading out of the player's notice entirely, (c) can be listened to ad nauseum without driving the player insane, and (d) stays cohesive with the rest of the music in the game. Game composers in general seem not to struggle too much with B, but A, C, and D can present serious hurdles.
What does it mean to "tell the story" of a level? Does it mean "this is an ice level, so we should sound aloof and use high, cold instruments like the glockenspiel and high-register piano a lot" and "this is the fire level, so let's use heavy brass and industrial sounds to bring the heat"? (Like Metroid Prime, maybe?) Does it mean "this is where they are in the story and how the characters currently feel, so let's score to that"? Do you foreshadow events to come? Look back on the past?
The answer is all of the above, in varying degrees. Very few games have managed to consistently pull this off. The field is littered with boring, cliché, and just plain bad level music. First let's look at the good, though. To do this, I'll turn to Nier, a woefully underrated adventure that has one of the best scores I've heard in a game.
It's a well written and very cohesive work, and every piece, even the town music, all lean into this general feeling of malaise and pervasive dread that the game possesses. This is not a happy game. It is a game, first and foremost, about loss -- of loved ones, of a life gone by -- and the music is a large reason that this is conveyed so well.
Nier's soundtrack was written to convey this profound sadness in every track; the composer noted in an interview that even the "thrilling" high-tension boss battles were composed with this pervading feeling of sadness in mind. Interestingly, the developer thought so highly of the music that elements of the game design were shuffled around to match the music, rather than the other way around. This interplay where the director occasionally bows to the composer is commonly present in film, notably.
One of the main reasons it feels so cohesive and fitting as level music is the presence of one or two female voices on each track. In the story there are two women who feature prominently, and in fact who (minor spoiler) turn out to be a sort of watchdog/architect duo, looking after the world. One, at least, is always found singing and strumming a guitar. They also have an "on-stage" performance. So in a sense, they sing the soundtrack as they watch over the events of the world.
In other words, the female voices that are heard on nearly every track have actual diegetic in-game significance, which is a very nice touch. The music and the plot are tied together in a compelling and interesting way, and this heightens the narrative considerably. Let's look at a few specific examples from the game:
What do you "see" when you listen to this?
Not just general mood, but more specific. I'll tell you what I see: I see a place of great history and mystery. The minor key and open underlying string pad add to the openness, and the soaring soprano and what I would describe as Vaguely Middle Eastern Percussion place it in a desert region (notice that, as with the Hobbits example, we rely on the listener's outside knowledge to shorten what we have to explain). In truth, this level is an ancient desert temple of unknown origin, which (as is cliché in games) holds a secret of great power. I also hear a great pathos to the music, as if it's crying out for things long gone.
I can attest to the fact that listening to it for over an hour won't wear on you, and as we will soon see it is cohesive with the rest of the game, while at the same time expressing its unique location. Now the next cue: