In the first part of this article, we began our examination of cinematic cutscene creation within game engines (the art form called "Machinima") by looking at tool design, overall project management, and content creation for cutscenes.
Now it's time to move on to the meat of cutscene creation, production and post-production, before ultimately considering the specific strengths and weaknesses of real-time 3D as a cinematic medium. Before we discuss the practicalities of your cinematography, though, we should first look at the intent of your scenes -- what is the purpose being served by your masterpieces of cinematic excellence?
Uses of In-Game Cutscenes
Cutscenes are used within games for a wide variety of purposes, some of them more obvious than others, and more importantly, some of them more appropriately than others. The use to which you intend to put your in-game cinematics will determine many things, among them the budget you will need to assign and the techniques you'll need to use. Thus, it's worth examining the most common implementations of in-game cutscenes, and the appropriateness of each kind of implementation.
Dialogue. One of the most common uses of Machinima cutscenes in the recent past has been for in-game dialogue sequences to the player: in any circumstance where another character needs to talk, whether to develop the game's plot or to give a lengthy exposition, many recent games have developed a distinct tendency to shift into full-on cinematics mode for this purpose.
This can certainly be a very effective technique for maintaining the mood of the game while delivering information and furthering the plot. However, it's also the most common way in which Machinima within games fails, losing players' interest and devolving into a tedious ordeal to get to the next interactive portion of the game.
For noninteractive dialogue scenes to work well in a game, they must first and foremost be visually interesting. in other words, any purely dialogue-driven scene is doomed to failure from the outset. Always remember that film is a visual medium: if your cutscenes don't add anything beyond what would be gained by reading the text of the dialogue, you'd be better off saving your money and effort and put the text on the screen for players to read at their own pace.
In particular, if your game depends heavily on in-game dialogue, you'd do well to employ techniques to convey such dialogue other than noninteractive Machinima: unless your cutscenes are rare or truly exceptional, your players will get bored sitting and watching endless sequences of dialogue that they can't participate in or change (which is, after all, the point of playing a game).
Baldur's Gate-style interactive text trees work much better for extensive exposition. They require player interaction, which keeps players from getting bored and keeps them immersed in the game, and the fact that they use a textual rather than visual medium means that players can imagine the conversation's visuals themselves, which means that they'll be of much higher quality and impact than anything you could produce on their screen.
Overall, Machinima has many uses within a game, and certainly brief dialogue and conversation scenes are among them. However, it isn't and shouldn't be used as a cheap and cheerful method of delivering hour upon hour of dialogue. Creating convincing, engaging dialogue scenes within Machinima is a difficult and painstaking process, not something that can be whacked out by the hour to provide cheap visual accompaniment to your spoken script.
(A final note on this point: both Half-Life and the Final Fantasy series employed techniques quite similar to Machinima to deliver extensive dialogue sequences. The key here is that neither took control totally out of players' hands. Half-Life simply had its dialogue delivered to the player's perspective, which the player could still control freely, while the Final Fantasy series, again, presented the dialogue as part of its gameplay, allowing the players to click through it at their own pace. Both of these games are excellent models of using extensive dialogue effective within a game.)
Half-Life simply had its dialogue delivered to the player's perspective, which the player could still control freely,
Introduction of plot elements. This is a fairly specific category for Machinima cutscenes within a game, generally covering a single shot or simple sequence of shots of a new location or new creature (otherwise known as the "Look at this cool thing you're about to blow up" shot). The primary defining factor here is that these are generally short sequences intending to introduce the player to or familiarize the player with objects, locations, or objectives that will be important later on in gameplay.
The simplest example of this usage is the post-Heretic II cutaway shot to reveal the results of a player's pushing a button or a lever on the current level: the player acts on the level, and his view briefly cuts away to show the results of his action (if it's not immediately visible). This is an excellent example of Machinima usage in a game: it's quick and simple (and doesn't take much effort to implement), it's entirely visual (and very effective because of it -- Quake's old "A door opened..." messages didn't work nearly as well), it rewards the players with "something cool" for their actions, and it doesn't take control away from them for too long.
Similar uses can be seen in KISS Psycho Circus: The Nightmare Child's pre-level fly-by sequences (again, very effective and simple), and cutaway sequences such as Heavy Metal: F.A.K.K.2's introduction of boss monsters. Again, these work well, they're the most efficient way to achieve the desired result (player's jaw dropping and muttering "Oh, %$%^!!"), and they don't require very much effort to implement.
KISS Psycho Circus uses Machinima technique for pre-level fly-by sequences
Overall, such sequences are well done in today's games, and hopefully we can keep it up. Sequences such as these are a no-brainer if your game can accommodate them -- they add value and enjoyment to the game for very little outlay in time and cost.
Plot Development and Mission Briefings. As a third major use for Machinima cutscenes in games, these scenes are the most varied and the most complex. Examples of this form of cutscene can be seen in everything from Dark Reign 2's briefings to F.A.K.K.2's extensive use of cutscenes for plot advancement.
Obviously, these scenes overlap fairly heavily with in-game dialogue: however, the major difference is that they don't rely purely or even heavily on it (or they shouldn't). In general, these scenes are intended to take control away from the players to show them elements of the plot developing and guide them toward what they are meant to do in the next interactive portion of the game.
Provided they remember both their function (to develop plot within a game, rather than to stand alone) and their medium (film), these cutscenes can be very effective. In essence, these scenes are short films, and should be treated as such. They're certainly a very cheap and effective way to develop the game's plot, but, as I mentioned in the first part of this article, they do still require real effort and cost, and should be budgeted for and scripted accordingly. It's far better to have a three-minute sequence that astonishes your players than a half-hour one that bores them.
Although it doesn't use Machinima, Final Fantasy VIII probably holds the record for best use of intervening cutscenes in a game. The scenes are short, spectacular, visually based, and develop the plot of the game without ever boring the player. Obviously, with Machinima it's possible to have more extensive scenes, and to develop more of the plot within them than FFVIII does, but the principles it follows are worth holding to in any similar project.
Introductions and conclusions. The last potential use of Machinima within a game is to create its introduction and conclusion sequences. Currently, most games still use prerendered sequences for both roles, but in the next year or so I predict that will happen less and less, as people begin to realize the potential of Machinima.
Overall, these sequences resemble plot-development cutscenes, but have added roles: to draw the player from the "real" world into the game world, and to wow the player from the word "go." Obviously, for both roles visual spectacle is very important, which is the reason why prerendered sequences still hold onto this role: however, it's worth noting that Machinima can achieve visual results well above and beyond expectations, and save the developer thousands of dollars on expensive CGI.
A few games in the recent past have shown the potential of Machinima for creating introduction sequences: the Unreal series (both of whose introductory fly-by sequences are nothing short of stunning) and Half-Life's introductory train journey (which, while not strictly a film, uses a restricted camera view to create its effect). Both work exceptionally well because they play to the strengths of both Machinima and their respective engines -- in essence, these sequences fulfill the expectations that prerendered introductions don't, showing the player something amazing and saying, "Yes, and the rest of this game will be like this, too."
In order to be effective, an introduction sequence in Machinima needs to be built from the ground up to show off the most impressive and involving features of the game, while simultaneously introducing the game's plot. Again, while it's certainly a cheaper option than CGI, it must be budgeted for appropriately to produce a suitably stunning effect. Machinima-based introductions are a relatively new development in games, but one with such potential that they are undoubtedly at least worth considering for any game using a 3D engine.
Film, as a medium, is the effort of the artist to reproduce the mind's eye on a screen. It has a language and a structure all its own, and in order to tell a story effectively using film, you need to speak that language and understand its structure. Ideally, when producing your cutscenes, you should hire an experienced film director and cameraman or camera team to work on them. However, many game developers don't have the budget for that, and will have to make do with their own internal resources. In that case, you should try and learn as much about film and its creation as possible -- you won't be able to achieve quickly the level of knowledge of an experienced director, but you should be able to learn enough to avoid embarrassing yourself!
The tips that follow are just that -- tips -- rather than anything even remotely resembling a complete guide. For further information on cinematic language, check out some of the references at the end of this article, and read Hal Barwood's 2000 GDC presentation. Above all, watch and analyze as many films as you can.
Basic Shot Conventions
The first element of film is framing: just like in static photography or painting, one of the key aspects of camerawork within a film is the positioning of its characters within the frame of the monitor, television, or screen. As a 2D representation of a 3D world, and more importantly as a fixed-viewfield representation of the panoramic view afforded by the human eye, film has developed a number of standard shots and techniques used to represent the world it is portraying as naturally as possible.
The screen is actually considered to have two frames within it: the actual edge of the screen, which is rarely considered, and a "virtual" internal frame, 1/10th of the way into the image (see Figure 1). No important part of the shot (characters, for example, and specifically characters' eyes and mouths if they are in any way the focus of the scene) should be allowed to stray outside this boundary. Within this frame, the angle of the shot and the position of the character on the screen (to the left, the right, the top or the bottom of the shot) serve as vectors to convey more information about the scene.
The screen is actually considered to have two frames: the actual edge of the screen and a "virtual" internal frame,
Angles of shot on characters, in particular, are used to convey their place in the story and the feelings the audience should have for them: a character seen head-on, for example, will appear threatening, challenging the audience (by breaking out of the internal world of the film and appearing to look straight at the audience -- as in The Silence of the Lambs, for example), whereas a character shot from a slight angle is meant to be part of the internal world of the film -- the viewer is looking at them from behind a "one-way mirror," and is an observer rather than a direct participant. (This is one of the more common mistakes made by inexperienced cutscene directors: I frequently see characters shot from head-on in cutscenes, provoking an inappropriate response in the audience and hence breaking them out of the intended mood of the film.)
Characters in The Silence of the Lambs break out of the internal world of the film and appearing to look straight at the audience.
Other angles of shot, such as profile or three-quarters behind view, signal different things again about the character we're watching, while the pitch of the shot is used to signal, among other things, the character or characters' place in the world: Are they in charge (we're looking up at them, as a child would), or is fate and the plot at large?
Placement of characters in a shot is a different beast altogether. One of the primary uses of varying placement of characters is to facilitate conversation shots via eye lines and "talking room," which I will discuss below.
Standard Cinematic Shot Placement
The other element of framing within a shot is that of choosing cut-off points within the frame, and hence choosing placement and zoom of a camera on a character. There are a number of standard points at which to "cut" the image on a human character, derived largely from the language of painting. There are natural-seeming and unnatural-seeming points to place the edge of a frame on a character, based mostly on the shape of the human body and mathematical ratios of distances to the edge of the frame (which, in turn, are based on the way the human eye and brain focus on a flat plane).
Most of these shot conventions are also based on the fact that the eye will naturally tend to focus about two-thirds of the way up a screen or other rectangular object: hence, the focal point of your scene (generally a character's eyes) should be placed here.
Shot framing is another very complex topic, and one which I don't have the expertise to talk about at any length. However, for reference and to get you started, here are the basic shots used in most modern cinematography:
Establishing shot. Anything you like, generally a wide shot, although close-ups of a key item can be used (the hotel sign in The Matrix, for example). It is used to establish the location of a scene.
Wide Shot. A shot that encompasses all the members of a conversation. The wide shot is used to establish position of characters in relationship with each other, and also to show changes in position of characters in relationship with each other (someone gets up and walks to other side of room, for example).
Long Shot. Includes full body of character. The long shot is generally used to introduce a character that was not present in the initial wide shot, although it can also be used to imply emotional distance from other characters.
Medium long shot. Shot from just above or below the knee to the top of head and head room This shot is most commonly used for same reason as the long shot.
Mid-medium shot. A shot from just above or below the belt line. This is a very neutral conversational shot and is often used to imply emotional distance.
Medium close-up. A shot from just above or below the nipples to above the head. One of the most common conversational shots, it is often used during casual conversation and is generally considered to be neutral.
Close-up. A shot from just above or below the shoulders. Another common shot in conversational camerawork, it adds more weight to the statements or reactions than a medium close-up.
Very close-up. A shot from just above or below the chin to the top of the head, often not including headroom. Adds a lot of weight to a statement or reaction. This shot is generally only used in moments of extreme emotion.
Extreme close-up. A shot consisting of a single, generally facial, feature. The extreme close-up is rarely used without the intention of focusing all the viewer's attention upon the feature shown.
Over-the-shoulder shot. Generally an medium or wider shot that focuses on one character while having part of the other character in the shot. Over-the-shoulder shots help establish the positions of characters relative to each other and as such is often used when the focus is on changing positions.
Two-shot. A closer version of the wide shot that allows some emotion to be shown while establishing position. The two-shot is most often used when characters are side by side while conversing.
It's arguable that most storytelling in film is based on a dialogue between two or more characters: whether it's a conversation, a fight scene, a chase, or virtually any other possible activity, the drama and tension in a film comes from the interactions of the characters. Again, film has developed specific syntax for shots and shot sequences following such interaction, to settle the viewer within the world of the film and provide the unspoken information that helps make the spoken conversation intelligible. This, of course, then interacts with the other functions of the language of film, guiding the audience's eye and encouraging them to empathize with or otherwise feel emotion for or about specific characters at specific points.
The line. One of the most basic and important elements of camerawork following any sort of dialogue is the concept of the "line." Between two characters having an active dialogue there lies an invisible line, over which the camera can very rarely cross without disorienting the audience and making them feel uncomfortable (as you would in real life, stepping in the way of two people having a conversation). This applies both within moving camera shots (although to a lesser extent) and sequences of cuts: once a line has been established, the camera should almost never cross it until that dialogue has finished or moved -- in other words, once the line has been erased.
As I mentioned above, that doesn't just apply to conversational scenes, either. In a sword fight, for example, a line of communication exists just as clearly between the two people fighting (interestingly, the concept of the "line" is just as important in classical fencing as in film school, and for similar reasons). If the camera crosses that line while it exists, the viewer will become confused, as the protagonists appear to switch over places on the screen.
In the case of a dialogue with multiple participants, the "line" becomes more difficult to judge. In most cases, there will still only be one "line", between the two most active participants at that point (the person speaking and the person he's speaking to), but that line will shift around as those people change.
In general, if the line changes or becomes confused in a scene, the convention is to cut to a wide shot of the scene, which will re-establish in the viewers' minds the position of the line. From there, the shot sequence can continue based on the new line.
Talking room and eye lines. In film, a 3D space is reduced to a 2D picture. Therefore, further cinematic syntax allows the filmmaker to place the protagonists within the 3D space of the story. At the most basic end of this syntax lies the concept of talking room, the placement of characters within a shot to show in what way the two of them are interacting in a dialogue.
Put simply, any action on the part of a character, be it talking, firing a gun, or even looking in a certain direction, has a certain amount of space on-screen beyond the point from which that action emerges, to "balance" that action. So, for example, if you have a character talking, he cannot be placed center-screen (unless he's talking to the camera) -- the shot must be balanced, so he must be placed off to one side (the left side if he's looking right, for example), to allow his acton "space." Likewise, if the person he is talking to is looking left (towards him), they must have "listening room" to the right of the screen.
In addition to this is the concept of the "eye line." This is the apparent line of sight of the person on-screen: if the two participants' eye lines do not match up so that they appear to be looking at one another, the shot sequence will look very wrong.
Fortunately, there's a simple way to ensure that the eye lines do indeed match up. When the shot used in conversation isn't a two-shot or a wide shot, the camera should be placed so that the lines to both participants in the dialogue cross at the camera forming a right angle. Thus, the eye lines will look natural and the viewer won't be jerked out of immersion.
Beyond individual shots, the process of assembling shot sequences in a film is a totally different discipline, one that is traditionally the joint responsibility of the director and the editor. Amongst non-film-makers, the process of editing tends to be rather undervalued- it's a common belief that once the film's shots are set up, they just fall into sequence without any difficulty. Nothing could be further from the truth.
Ideally, as I mentioned in the first part of this article, if you're working on a Machinima cutscene you'll have some form of separate nonlinear editing system available for assembling your shots. Nonlinear editing facilities, where the user chooses from a library of preexisting shots, and can cut them together, change their length, and insert and delete them at will anywhere in the film without having to make changes anywhere else, have traditionally been one of the reasons why film (which is naturally nonlinear) produced much higher quality results than video. Nowadays, non-linear editing facilities are almost essential if you want to produce a high-quality film in a reasonable time span, in any medium.
The first thing that will surprise many people about the editing process is the number of shots that are incorporated into any specific sequence. Action sequences in particular use a terrifying number of shots (often more than one per second in a complex sequence), but even conversational camerawork will generally cut to a new shot at least once every ten seconds. To see the reasons behind that, you need only watch a film where the shot cuts far less frequently (like some recent Machinima cutscenes): without a change in the shot, the image on-screen rapidly becomes very boring, and the viewer loses interest and is distanced from the story.
This ties into another point. It's a common assumption that a camera will generally stay focussed on the protagonist in a scene, whether he's having a gunfight or an intimate moment. However, that's not the case.
Why? Because the visuals in an scene give viewers additional information about the scene that they haven't already gotten from the soundtrack. In a conversation or dialogue, that information is frequently conveyed in the reactions of others to the action taking place. Thus, if you watch a conversation scene in any TV drama or film, you'll see that fewer than half the shots during a conversation are actually of the person talking: the majority of the shots will be showing the nonverbal communication going on, by showing the reactions of others to the actor's speech.
If you want your Machinima to work as film, this is another convention you must follow. Of course, that means, as I mentioned in part one of this article, that your character models will need to be capable of simulating emotional reactions.
So, where do we get all these shots from? Do we need to set up 20 or more camera positions for each minute of dialogue? No. In general, most filmed scenes will use a few shots repeatedly. This helps establish a sense of continuity in the scene, where multiple varying shots will confuse the audience.
The standard setup for a conversational scene, for example, will use seven or eight different positions for a two-person conversation: a wide shot of the scene, a mid-shot and a close shot on each person, an over-the-shoulder shot on each person from the other's perspective, and perhaps one more shot (say, an extreme close-up for moments of emotion). This will provide all the shots you need for a two- or three-minute conversation, provided these shots follow the characters as they move.
Lastly, it's important to remember just how far the audio track on a film is independant of the same film's visuals. Audio, such as voices or sound effects, frequently carries over shots focussed on something very different from the things producing the sounds. Indeed, in the case of a sudden sound, like a door crashing open, the sound effect will often begin a fraction of a second before the scene switches, to alert the viewer to the fact that there's a transition coming.
Despite all this emphasis on Machinima as a cinematic form, and despite the fact that it does share a great deal with other visual media, it's still a unique form, with its own strengths and weaknesses, and hence unique approaches for overcoming them. In thefew years that artists have been working in Machinima, the body of Machinima-specific techniques has grown at an astonishing rate: it would be quite possible to write numerous articles on this subject alone! To conclude this whistle-stop tour of Machinima cutscene production, I'll discuss a few specific examples of tried and tested Machinima techniques that you can put into effect in your cutscenes.
Composite models. Machinima doesn't have an awful lot of common ground with Newtonian physics, but one trait it does share is a tendency to break down when faced with portraying either very large or very small spaces. The portrayal of very large landscapes is half-and-half a technology and technique issue, and is a very complex area -- however, there are one or two well-used techniques for dealing with close-ups and extreme close-ups.
In particular, when a scene demands an extreme close-up of detailed interaction that isn't easily created using conventional Machinima (interaction between a model and the landscape, in particular), directors will tend to switch to a composite model. The principle here is simple enough. When the close-up is required, the camera cuts to it as normal. However, instead of cutting in closer to the previous model, we instead cut to a totally different composite of the entire shot (landscape, characters, and all), all created as a single model in any modeling package, and textured exactly as the original landscape and characters were. This model is then animated in whatever way is necessary (a foot kicking up sand and leaving a detailed imprint, say), and the camera stays on this animation before cutting back to the original scene.
To the viewer, therefore, it looks like we've simply cut in to a close-up of the action, before cutting away again, However, by using this technique, we're able to add detail and interaction that would have been impossible in the original set.
An example: in Strange Company's film, Ozymandias, the storyboard called for us to cut from a wide shot (of the protagonist kneeling by the eponymous statue) to a close-up of his hand brushing piled sand away from the statue's plaque. Obviously, this wasn't easily going to be possible using a conventional map/model setup.
Ozymandias, called a cut from a wide shot of the protagonist kneeling by the eponymous statue to a close-up of his hand brushing piled sand away from the statue's plaque.
Instead, our modeler created a model of the entire close-up scene (hand, plaque, sand, and base of statue), then created the "brushing" animation, complete with moving sand, in 3D Studio Max, before importing it into Lithtech Film Producer (our production suite on that project). We then simply cut from the wide shot in the desert map to a close-up of the model, positioned underneath the other map, and triggered the animation. When the shot concluded, we simply cut back to the original wide shot, and the film proceeded seamlessly.
Skin techniques. As I mentioned in the first part of this article, it is possible to use skin textures within a Machinima-based cutscene to simulate a lot of detail models may otherwise not posses (for example, Strange Company's Eschaton: Nightfall used skins to create both lip-synching and eye movement effects in Quake II, effects otherwise impossible without access to the engine's source code). This ability can be used to add atmosphere to a film, as well as to cover up a lot of the common weaknesses of 3D engines.
A good example application here is the simulation of complex lighting, on the face in particular. Take a shot from "real" film, such as the poster shot from Eraserhead for example: the complex shadowing here, with shadow from one part of the face falling on other parts, simply can't be reproduced using today's real-time engines without a hideous performance hit. As a Machinima director, you might think that any effort to reproduce such a shot in your work is doomed to failure.
The complex shadowing from Eraserhead, with shadows from one part of the face falling on other parts, simply can't be reproduced using today's real-time engines without a hideous performance hit.
It's not hard to see where I'm going here. By taking the standard skin for that character, setting up the shot in your Machinima package (including the appropriate lighting for the background), then manually altering the skin used on that shot (perhaps by rendering it in a conventional 3D package), it's possible to fake such a complex shadowing shot, and reap the rewards in atmosphere and "wow" factor that extend from such a trick.
Of course, such effects are expensive in development time. However, for a truly dramatic shot, they may be worth it. It's also worth noting that if the face moves or the lighting changes even slightly, you will need to create several new skins, one for each lighting state. In effect, you're manually creating light maps for the face.
If you use this technique, as with all such "cheat" shots, don't let your shot linger too long, lest your audience manage to see the man behind the curtain (or in this case the airbrush). It's far better to astonish them and move on.
Dynamic Algorithms. This last technique is very experimental, and is currently undergoing testing in Lithtech Film Producer. However, if it is integrated correctly into an engine, the visual rewards may well be truly astonishing.
In short, cutscenes within game engines differ in engine requirements in one simple way: it's possible for their creators to tell absolutely exactly what the system requirements will be for their scenes at any time. Thus, if a scene is designed in such a way that it underutilizes the system resources available, it's theoretically possible then to enable techniques which wouldn't be usable on a more complex scene to render the current scene in a more complex way.
What does this mean? Well, at the most extreme end, it means that using a current real-time engine, if a scene only has 200 or 300 polygons in view, it's possible to use genuine raytracing algorithms for the lighting effects on that scene. More realistically, it means that cutscenes can ratchet their computational complexity up and down as the complexity of the shots change -- a shot with a simple backdrop and few characters could, for example, enable additional shadowing, higher-quality texturing, or increased real-time lighting quality.
Of course, this capability has to be built into the engine, and has to be used carefully, lest the cutscenes start to differ in visual "feel" from the main gameplay. However, if such a technique is used appropriately -- perhaps increasing the number of real-time light sources available on a close-up to give a character's face greater depth, for example -- it can add a lot to a cutscene.
Obviously, in an article like this one, it's only possible to scratch the surface of such a complex and interesting area -- nothing less than the growth of a new art form. However, I hope it gives you some pointers and encourages you to experiment, to investigate further, and to refuse to accept (many) compromises in your real-time 3D cutscene design.
For further information, I heartily encourage you to check out these resources:
Machinima.com: the premier site for real-time 3D film-making on the Internet (and of which -- full disclosure -- I am editor-in-chief). With over 100 articles on all aspects of Machinima filmmaking, this is a valuable resource for anyone working in this area.
one of the better conventional filmmaking sites on the Internet. For
information on film-making and film theory, this is a good place to
Cutting To The Chase by Hal Barwood, which includes a lot of helpful tips and an exhaustive (and exhausting) bibliography.
Katz, Steven D. Film Directing: Shot by Shot. Michael Wiese Productions, 1991; ISBN 0-941188-10-8. I agree with Hal Barwood that this is one of the best bases for learning the language of film available.