Wide shot, interior view of a luscious-looking castle: high-quality pictures on the wall, shadows across the floor giving it the feel of a set straight out of "Lord of the Rings." We hold for a moment, admiring the view.
Camera shudders slightly as we pan right to a MAN walking across the floor towards us. He's richly dressed, clad in a cloak which mysteriously pokes through his knee as he walks, and moves stiffly, slightly unnaturally, as though he should be holding something in his hand which has been removed. We follow him, center shot with entire body clearly visible, across the floor. As he comes to a stop his feet skate slightly.
A WOMAN, standing in a part of the set we haven't previously seen clearly and so momentarily disorients us. She, again, is dressed in rich clothes, highly detailed, but moves in a jerky, repeating pattern with her hands unnaturally stiff to her face. The repetition gives us the strong impression that she is suffering from some kind of nervous disease.
When she talks, her mouth (and a portion of her cheek) flaps up and down roughly in time with her words.
WOMAN: (Monotone) Oh no, oh no. The bad one has come for us. Whatever shall we do? Where shall we find a champion to protect us?
As 3D graphics engines advance, more and more game developers are coming to the realization that this newfound graphical power can be harnessed not only in traditional gameplay sections, but also to replace the horrendous expense of prerendered cutscenes with in-engine cinematic sequences. And yet, many of the cutscenes so produced resemble the fictional script above more than the work of Spielberg, Scorsese, or even Square.
Certainly, we've got a while to go before in-engine rendered cutscenes can begin to rival the graphical beauty of Diablo II's intermission movies, no matter how well they are made. However, much of the bad reputation in-game cutscenes have earned, and the low expectations they engender, have less to do with the raw graphical ability of the graphics engine they use than with the production and direction processes their creators employ.
Real-time rendered films (called "Machinima" by the many hobbyists who use them to create stand-alone films) are a cinematic form just as much as any prerendered spectacular, and require that their creators understand the quirks both of movies in general, and of Machinima production in particular, in order to achieve satisfactory results. In this two-part article, I hope to provide some insight into some of the most common mistakes and omissions made in real-time cutscene creation, and to point out a few pathways to truly cinematic cutscenes within game engines.
First, and most importantly, any team intending to use real-time cinematics within their game must realistically budget both time and money for that part of their project. There is a perception that Machinima cinematics are an ultra-cheap -- almost free -- way to add cutscenes into a game. Just get one of the team members to write a script in his or her little spare time, have the voice actors dash off their lines between takes, and have the entire thing scripted into the in-game maps by the level designers, again in the time they have between other projects. Obviously, if you're intending to create cutscenes of any quality, that's a recipe for total disaster.
A Machinima project is an animation/film project, simply using a medium that is cheaper, but also younger and less polished, than prerendered animation. If you intend to produce high-quality results (in other words, results that will add to your players' enjoyment rather than detract from it), then you must approach your in-game cutscenes from this point of view, and budget accordingly:
- Hire professional scriptwriters for the cutscenes, and have those scripts then edited by other professionals to polish and trim them.
- Allocate a reasonable amount of time for voice actors' takes, and have them directed by someone with experience in voice direction.
- Allocate time in content creation to create custom animations, sets, and models for your cutscenes.
- Have your cutscenes storyboarded shot by shot, then filmed and edited by filmmakers (ideally, experienced Machinima creators) who have experience in those two very different roles.
Remember, your players will be sitting watching, rather than interacting with, your cutscenes. Instead of assuming that they aren't as important as your gameplay and can therefore be of a lower quality visually and aurally, you should remember that your players will have nothing else to occupy them but watching and listening during these scenes, and therefore that they need to be of a corresponding quality, so as not to lose players' interest.
On the other hand, don't succumb to script bloat, either. Your script should be as tightly edited and to the point as you can make it: a script that is twice as long as it needs to be is likely to be half as good as it could be. There's a temptation once you get away from the spiraling costs of prerendered scenes to produce huge, hour-long epic cutscenes. Remember, if the interest in those cutscenes could have been compressed into five minutes, the other 55 of those minutes will see your players getting very, very bored.
The quality of any Machinima project is dependent on two factors: the amount of time and effort invested in it, and the quality and ease-of-use of the tools used to create it. I'll be coming back to the "time and effort" point later -- for now, let's look at tool designs. Traditionally, there are three basic models of tool design for real-time cinematics in game engines. All of them have their advantages and disadvantages, so let's look at them one by one, in chronological order.
1. Demo editing. This technology came first -- it was using these techniques that the first hobbyists created the original "Quake Movies" which first showed the potential of 3D engines to create films. This technique, growing out of the hacker's ethos of the original Quake community, uses special tools to edit, cut, and splice in-game recordings of earlier games -- originally Quake deathmatches -- but with some design changes these recordings can easily be the equivalent of a studio-filmed rush from which a finished film can be produced.
These techniques have rarely been used within the game production community, but groups such as Cnet's Gamecenter and the BBC are now using similar technologies to televise games like Quake 3: Arena and Unreal Tournament.
This technology has a lot of advantages:
- Films are very quick to produce, as it's far faster to give a human a complicated series of instructions on his role than to script those instructions for a machine.
- A well-organized shoot can exhibit much of the spontaneity that enlivens "real" filming (a way in which this technique is unique in the world of animation).
- Because the "demo editor" used can easily be tied in to send the game engine information allowing real-time previewing of edits, it allows the user to receive instant WYSIWYG feedback on their editing.
However, it also has a number of disadvantages:
- It's very hard to get any sort of precision in the initial shoot. While a lot of this imprecision can be cleaned up by editing the demo produced, it's often the case that this takes substantially more time than simply creating the shot from scratch using other techniques.
- Because of the "live filming" aspect, the actual shoots require a large group of "actors" and a LAN, as well as a lot of coordination of that group.
- Once "filming" is complete, if major changes to the action are needed you'll either have to reconvene your actors or perform a lot of complex scene creation and tidying. Frequently neither are particularly viable options.
- From the point of view of the tool designer, this is a complex project, requiring a new tool to be built with a lot of versatile editing capabilities.
Overall, demo editing is a very versatile and usable technique (particularly for scenes requiring a lot of complex, intelligent movement -- it's particularly useful for battle scenes). However, on its own it is very limited, particularly for scenes with complex animation or very precise synchronization to other factors in the film.
2. In-map scripting. This was the next technique to develop, and so far has been the most widely used, appearing in games from Half-Life to Unreal to Soldier of Fortune. Here, cutscenes are entirely scripted within the map editor used for level design, usually using the same scripting technologies used within the game to produce in-game scripted actions.
Obviously, from the tool-design point of view, this is a very simple and direct approach, requiring little additional design -- all that's generally needed is a "camera" object within the game's structure, with some movement options, controlled from in-game triggers or scripting engines.
- Initial shot and sequence setup is usually fairly swift, and it is possible to create very precisely controlled action sequences within the engine.
- Because audio can be scripted within the engine alongside the action, audio synchronization to the on-screen action is relatively simple.
- As mentioned above, tool design time is minimized.
- All cinematic creation (except voice acting) can be done by one person.
- Editing character positions and actions after initial creation is very simple.
- Unlike the demo editing approach described above, this technique does not work on a WYSIWYG basis. In order to see the effects of a change in a cutscene, the creator must recompile the map on which he or she is working, and then run that map from within the engine. The lack of instant feedback this produces is definitely detrimental to the quality of films edited using this process.
- Likewise, compile and run times for maps using this technique may well discourage the creator from trying different techniques or editing the work more than the bare minimum. At the very least, the compile times will eat into the time available in which to work on the cutscenes.
- In general, scenes created using this technique are edited linearly, rather than nonlinearly. Translated from film terminology, this means that the order of shots (and frequently their duration) acts like a stack of shots rather than a linked series: in order to swap the positions of two shots, or even change the duration of a shot, it will frequently be necessary to re-create large portions of the sequence. This isn't always the case (particularly if scenes are very carefully created with an eye towards a nonlinear approach), but the interplay of scripted movement and scripted camerawork certainly tends toward this problem.
- The time taken to create a shot increases dramatically if that shot involves complex interaction between multiple characters. Any large fight sequence, for example, will generally be extremely difficult to create convincingly using this system.
- The film will only be as good as its scripting language. If it is difficult to create an effect which is needed for the film, then that shot will be compromised. Of course, it's possible to rewrite or add parts to the scripting system to accommodate new requirements, but this will take time on the part of both the programmer and the cinematographer.
Overall, despite its prevalence, the in-map scripting technique does not work particularly well, and may well be responsible for a lot of the quality issues within older cutscenes. Its lack of a decent editing facility and its reliance on recompilation of the sets on which the scene is taking place are certainly its biggest flaws, and can seriously compromise the quality of films created purely using this technique.
It's worth noting that many of this technology's problems can be ameliorated by combining it with another method of film creation, such as demo "re-camming" or even conventional video editing software such as Premiere. This may be the simplest approach to take if tool creation time is a major concern.
3. Independent scripting. This third technique has been employed in several different forms in a number of recent games. In general, independent scripting uses a combination of in-engine tool work and text scripting to create its films: some games, such as Vampire: the Masquerade -- Redemption, simply allow the user to select camera angles and actor positions via an in-game engine tool and then script the action within the scene entirely in a text interface (with Java scripting, in Vampire's case). Other games allow the user to set up and preview entire shots within the cutscene editing tool, including character movement. However, they still have an underlying script for the scene to follow, with the in-game tool acting as a "helper" to create this script.
Vampire: the Masquerade -- Redemption allows users to select camera angles and actor positions via an in-game engine tool.
In many ways, this is not a single tool technology, but a base point from which to build a tool set. However, there are certain common features (the presence of a script, rather than a recording of action within the game, and the reliance on a separate file rather than information within a map) that make this approach unique.
It's safe to say that this technique has a number of advantages:
- WYSIWYG editing, at least to some degree. Exactly how far this capability goes is up to the designers of the tool set, but it's certainly relatively easy to enable anything up to full WYSIWYG capability.
- Instant feedback. The cinematic creator can see how changes affect the film instantly -- and, unlike the other tool systems mentioned above, the creator can easily make major changes to anything in the film, including character movement.
- All cinematic creation can be performed by a single person.
As with the potential advantages of this technique, many of its disadvantages depend on the design of the helper tools used with it. However, some disadvantages are constant.
- While it's possible to set up a nonlinear editing environment within this system, the scripted nature of the films created predispose it towards linear editing only.
- In order to create a useful tool, considerable programmer time and effort will have to be invested. As with any content creation utility, there is frequently a lot more work that needs to be done to create a usable tool than is initially apparent.
- Similar to the other approaches, as a scripting tool, the more complex a scene, and the more characters involved, the more complex the scripting becomes. Battle scenes are frequently particularly difficult, as I mentioned above.
The Ideal Tool?
None of these approaches is ideal -- in fact, on their own they all have gaping flaws, and flaws that can't be easily solved within a single tool. However, that's more than just a design challenge: that's a signal that there's something drastically wrong with the "single tool" approach. A quick look at the credits for any Hollywood blockbuster will tell you that film work isn't a single, homogenous whole. By the same reasoning, then, neither is Machinima production.
Let's consider again the basis of "real film" work, in which there are several main stages of production, apart from content creation (audio recording, set design, animation, and so on): pre-production, production and post-production.
In pre-production, the final draft of your script is analyzed by the director of the cutscenes and broken down into a shooting script and storyboards. The shooting script allows the director to break down the script textually to help the him or her and the team begin creating storyboards. Storyboards are simple illustrations that assist the director visually of the cutscenes prior to the actual production. During the storyboarding process, the director and team can start to see what elements will be needed in production (as well as which game assets can be repurposed for the cutscenes). While this preliminary effort may seem like a lot of work, it is invaluable throughout the rest of the cutscene (if not the entire game's) production.
In production, individual shots are filmed. There's no effort just to capture the action needed here, or to shoot the shots in any kind of cinematic order -- that comes later. In Machinima terms, this is where individual character movement is scripted, in-scene audio is synched into the action (characters talking, feet scuffing on floorboards, and other such "real-world" effects), and camera positions are set up. This would also be where any live demo filming is done, and the subsequent shots tidied up.
The important point here is that the target end product for this stage is just individual, uncut shots from the film. This part of the process requires that its tools be very flexible, and focused on creating single sequences: complex movement creation tools; flexibility in camera movement, focus, and position; and precise timing of events are all important here.
Then, in post-production, these shots are edited together to form a whole, and additional audio (mostly background music, in the case of Machinima) is added over the top of the finished, edited piece. Editing is a very different discipline from camerawork, and demands very different skills and tools: the ability to chop and change shots freely about in the shot sequence of the film and to edit their lengths and transition between them, is important here. Most utilities commonly thought of as "film packages" (such as Adobe Premiere) are actually editing tools.
In development work on Lithtech Film Producer, we very quickly moved to a paradigm based on this breakdown of the film-creation process. The ability to break the work down in this fashion is probably the most important missing element in many Machinima creation packages today. By trying to do both jobs at once, any tool ends up lacking on one or the other side, hampering its users ability to create quality films.
Obviously, other elements of the package are also important: as a visual medium, it's vital that any Machinima package have WYSIWYG capability. The potential of live filming for certain scenes is huge, and any tool designer should consider including some form of live filming option within the Machinima tool set. Lastly, if your game does have scripting capabilities, then incorporating them into the pre-production section of the tool set in some form is also an excellent idea.
Much of what I'm going to say about content creation for in-game cutscenes can be boiled down to a simple statement: if you're going to use in-game cutscenes, expect to create about as many custom sets, animations, and models as you would have to for an equivalent prerendered sequence.
Again, Machinima is simply another form of animation, and it demands as much skill and time as any other form. True, there are a number of shortcuts and useful tricks (which I'll talk about in a moment) to make it appear that you're using more in-game animation than you are, and it is often possible to reuse game content, but if you're expecting to be able to get away with game idle animations and unaltered character models, you're being very unrealistic if your goal is to something interesting to watch.
Yes, in general it's entirely possible to use game models for your cutscenes, and indeed it's a good idea if you want to keep your players immersed in the game world while they're watching the cutscenes. However, if you're going to do this, your game models have to be designed from the start with not just gameplay but also cutscene use in mind.
Unlike most gameplay foci, the primary focus of the camera during a cutscene will be on the head and face of your characters, to better display emotions. Obviously that means that you'll have to assign space in your polygon budget for highly detailed character heads -- and don't forget about the upper torso, which also appears in most close/medium to close shots.
In general, most close shots will cut around the nipples on a character's chest. Thus, things such as overly thick necks, spiked shoulder joints, and unrealistic breast shapes and sizes (you know who you are...) will certainly be more noticeable, and care should be taken on your actor models to avoid such problems. Shoulders, thanks to the commonly used conversational over-the-shoulder shot, are particularly important. It's worth taking the time to make sure that they join properly and react well to arm movement.
Many models are designed using standard orthogonal views, and hence look great from the latter two angles but flat and uninteresting from a 45-degree angle. That's a major no-no: film language dictates that a good cameraman is most likely to shoot your models from three-quarter side shots, profile, and, least often, full-frontal. Design accordingly, and remember to check that your heads look good from the back, too (again, for the ever-present over-the-shoulder camerawork.
Film is about nothing if not emotion and reaction. In any conversation, more than half of the camera shots are likely to be "reaction shots": shots looking at other characters than the one speaking, giving the viewer additional information about the story from their reactions.
Why is this important? Because if you want your cutscenes to have any weight at all, your characters need to be able to react, and they need to be able to show emotion. In other words, whether by cunning skin changes or by facial animation (although the latter is preferable), they need to be able to move more on their face than just their mouths.
This is a very important point, and I'm amazed that as yet I've seen no game that implemented it. For an example of how simple facial animation can bring a character to life, see Monolith's Lithtech 2 technology demo videos: the Frankenstein character there isn't incredibly complex, yet the fact that he can react at all draws the audience in and gives them some ability to feel empathy with him -- the most important step in a film.
To put it bluntly, if your characters aren't going to be able to show emotion on their faces, then you might as well never use any close-in shots on them, or even have them conduct any conversation. Without facial animation, in fact, you may as well conduct conversations using the old Ultima staple of a static portrait next to text.
On a similar point, film is also all about eye lines, and the movement of the eyes and head. Even if nothing else on your character's face can move, having eyes (and the character's whole head) set up so that they can be scripted to "look" at a point will pay off tenfold, as you gain the ability to introduce shots, direct conversations, and direct the viewer's attention. Of course, eyes (and eyelids) are the most important components in conveying emotion, too, so this is a real two-for-one deal. (Also on the subject of eyes: People blink. So should your characters.)
Probably the most important point in skinning for Machinima (beyond the ever-present "detail the head" instruction), is the option to use multiple skins (specifically on the head, but also on the body as a whole) to add detail or emotion to your film. Varying "emotion" skins, in conjunction with careful facial animation, can really bring a character to life. Have a character's forehead crease, or give him or her a scratch on the cheek after some fighting or action, and watch as your world comes alive.
There are two important things to realize about animation in a real-time 3D film: first, you're going to need a lot of it, and second, you can fake more than you think.
There's an old animator's maxim that no frame (of a film) should ever be entirely still -- that's a good one to follow. However, it's followed by a new maxim that never really applied in the days of cel animation, but applies very strongly in the brave new world of Machinima in particular: No living thing ever loops its movement.
It's completely standard in game cutscenes to see a pair of characters standing, talking to one another, obviously repeating the same large-scale gestures over and over again. Frankly, unless you're in a very unusual circumstance (wind blowing at long hair, for example), that looks absolutely horrible. No living thing ever moves like that, and it's never going to do more than detract from the overall effect of your scene.
Possibly the best stand states I've seen recently come from Ion Storm's Deus Ex, whose characters simply stand, more or less still, talking to one another. I think there might be a small looping animation there, but I can't be sure. That's the absolute minimum you need to aim for in a standing state. Ideally, you'll add some random "noise" to some parts of the body (as this demo does for an animated head), or better yet precisely cue "idle" animations like shifting weight manually, as part of your animation in the scene.
The best stand states I've seen recently come from Ion Storm's Deus Ex
So, where do we get our constant movement from, then? Well, there's a number of techniques to animate a scene. First, and most obviously, your characters should be expressive in their conversation. If they're shouting, have them shake a fist at the same time (but just once, rather than endlessly looping the animation -- it might sound stupid, but I've seen it done). If they're whispering, have them come in a pace or two, lean toward the character to whom they're talking, and make furtive "shushing" gestures.
Obviously, that demands a lot of custom animation (hence another maxim of mine: "Conversation demands as much animation as any fight scene"). However, that movement doesn't have to just be animations from the character's list. Have them walk around. Have them stride closer to accuse someone, turn away when they're being verbally attacked, or wander around the room and examine paintings when their subordinate is boring them.
This is where your eye and head animation comes in, too. If you can cue your characters to look at objects or move their eyes, do so. Have them avoid the eyes of the person whom they just robbed. Have them look over their conversational partner's shoulder to signal someone else joining the conversation.
Finally, just to make your job harder, you should always have a reason for any character's animation. A movement or action made without reason (even if that reason is "my leg's going numb") confuses the observer and reduces the impact of your scene. All animation should further a character's personality and illustrate his or her thoughts.
It's eminently possible to set your cutscene within an existing game map, however, that again means that your map designer should bear the cutscene in mind while making the map. Obviously, they should first make sure that there's enough space within the map for the cutscene's action to take place! Minimizing extraneous detail within the area in which the cutscene is to happen is also important, as is making sure that any important features within the area which need to be brought out in the cutscene are visible from intelligent cameraangles.
Beyond that, a carefully crafted map for a cutscene can add a lot to the scene. Lighting, primarily, is a vital tool in film, and no less so in Machinima. A carefully placed light can plunge half a character's face into darkness for a sinister effect, and flashing or otherwise varying lights (fires, particularly) can add to the movement in a scene, making it feel more "alive."
Carefully placed geometry can add to a scene in the same way that set design can add to a film, too. Arches in which to frame important characters or speeches, bars or pillars to give the impression of imprisonment, skylines against which to position your protagonists -- the list is endless.
Overall, there aren't as many firm rules here as with model design and animation. The important thing to remember is that a map on which a cutscene is set doubles as a film set, and as such a dialogue between the cinematic creator and the map designer should be set up.
In the second part of this article, I'll be discussing film language, shot setup and editing, and looking into Machinima-specific film techniques to get around graphical limitations and make the best use of your engine's abilities.