Image Metrics is a technology firm working with companies like Vivendi, Activision, and Epic Games to produce "high-fidelity, performance-driven facial animation" through a combination of performance capture and animation refining.
To share information about the process of applying their technologies to the characters in Rockstar's Grand Theft Auto IV
, head of production David Barton and technical director Vladimir Mastilovic spoke at the Microsoft Gamefest event earlier this week.
Mr. Barton was fairly straightforward: "Our task was, 'How to bring him to life? How can we make him move?'" The 'him' in question was GTA IV
protagonist Niko Bellic.
To clarify the history of the capture medium, the production head analyzed the history of the process, noting that performance-driven animation actually dates back to rotoscoping, or tracing, live actor performances.
Said Barton, "It gives a level of realism that is difficult and time consuming to [manually] keyframe. Keyframing is possible as well, but you've got to have a hell of a good animator and it's going to take a really long time, and for a project on the scale of GTA
it's probably not even possible."
Direct motion capture is always an option, but Barton noted that making realistic characters "always comes back to the eyes. How can you animate the eyes if you can't put any markers on them?" It's problematic.
Image Metrics' solution, as recently showcased
by Popular Mechanics, is to "[track] all aspects of a face’s motion (including eyes and lips, which are difficult to attach markers to)", and then "...automatically map that motion to a template character."
Working on GTA IV
Barton was upfront about the importance of communication even during what might - from appearances - be a purely technical relationship. He stated, "We've worked with Rockstar Games for a number of years now and the only reason I mention that is if you're going to build a pipeline like this, it helps if you have that relationship. We know what we need to make animation look great but we need you guys to help with that."
That relationship no doubt solved a number of scale problems, as GTA IV
posted some daunting challenges. Facial rigs in the game had something on the order of 100 individual joints, and over the course of the game "there are 300 minutes of facial animation" according to Barton.
He continued, "The key of it is how you breathe life into these characters, and our way of doing it is to capture the performance of the actor, it's all based on the performance... and you need a great 3D rig, a great facial model."
At this point the Image Metrics staffers showed off a demo of GTA
's protagonist Niko Bellic, and other characters from the game. A model of Bellic's head was shown, with the individual joints working.
They showed the difference in rigged and unrigged characters by having them approach the camera, activate their joints, continue to walk, and then deactivate as they walked away. The difference was striking.
Here technologies director Vladimir Mastilovic spoke up, saying, "I just want to explain why we have so many expressions - it's because we have to match the actor's performance very closely. We have to fine tune the expression. In my mind these rigs could contain more emotion and can be further iterated."
's scale apparently caused issues at times. Noted Mastilovic, "Because of the scale of this project we had to have rigs finished before we started animating, but because it's all rigged on the timeline we could change the base mesh and we wouldn't have to re-skin. If Rockstar changed the design a bit, we'd just have to clean up a bit and we'd still have a finished character."
A key component to the projects success was the company's "rig transfer" technology. The rig transfer takes the master rig from one head and transfers it to any consecutive heads. Said the tech director, "As we move on the project we have 15 or 20 finished characters we can choose from, and rig transfer is very useful from this time."
Finishing the Work
"The question is, when is a rig final? I guess never is definitely the answer," stated David Barton. "We did have a sign off procedure from the high level developers at Rockstar pretty early off." That procedure worked through a number of concurrent pipelines to ensure the client was satisfied with the end result.
The animation pipeline involved a number of steps. Said Barton, the first was called "performance capture. A VO booth scenario - you capture video of the performance at the same time you capture the audio. Once we go through the capture process we need to go through the performance analysis process."
In the booth, the camera capturing the actor's performance includes a built-in teleprompter so they don't need to hold the script. The resulting improvement in performance quality is immeasurable, according to David Barton. "The videos are timed perfectly to the cutscene we're animating from, so Rockstar can just drop it into the engine. You can even drop in pickup shots into the overall cut to create a performance."
From this performance, a rig is constructed. "We select extreme moments in the actor's performances and use those to create poses in the rig," said Barton. He continued, "One problem with capturing in a VO booth is that our animation may not match the needs of your game."
This is because characters will be looking straight on (in the case of a teleprompter) or down (in the case of a script) rather than at other characters or events around them.
Barton asks, rhetorically, how the work can be improved. "Having realistic eyes is 90 percent of the battle in this - this is where people really buy into your character." To capture that element, a headcam used on a mocap stage to "capture facial movements in sync with body," according to the head of production.
Offered Mr. Mastilovic, "We realize that we have too many controls perhaps for a single rig, so what we are doing actually is going through" with an eye towards maximizing the impact of the emotions. To that end they've made "a study of the emotional impact that certain expressions have on the human psyche. That's one way of making the rig more optimized and more believable."
It's particularly the little details that make the character come to life. "Animating of wrinkles has additional impact on the audience and making certain expressions more strong and more believable. The way we're going to animate wrinkles is animated normal maps," stated Mastilovic.
Barton chimed in, saying "We can't this stuff into the engine - we need you guys to do that for us."
Closing out the lecture, Mastilovic mused on further possibilities for this technology, "Why should we stop at mixing animated normal maps when we can mix any map really? Color maps - so a character can turn pale, or turn red if he's angry?"