Deep Dive is an ongoing Gamasutra series with the goal of shedding light on specific design, art, or technical features within a video game, in order to show how seemingly simple, fundamental design decisions aren't really that simple at all.
Check out earlier installments, including creating believable crowds in Planet Coaster, evolving stealth detection in Shadow Tactics, and creating the intricate level design of Dishonored 2's Clockwork Mansion.
Who: Christopher Dragert, Ph.D., Team Lead Programmer at Ubisoft Toronto
Most recently, I shipped Watch Dogs 2 where I acted as principal programmer for the Invasion of Privacy missions. As well, I co-presented “Nuts and Bolts: Modular AI From the Ground Up” at the 2016 GDC AI Summit, and I share an article with Kevin Dill in the upcoming Game AI Pro 3. Before that, I received my Ph.D. from McGill University where I studied Model-Driven Development of AI for Games. I am currently working on an unannounced project.
What: Achieving Seamless Branching in Watch Dogs 2’s Invasion of Privacy Missions
In the original Watch_Dogs, there was a type of side mission called ‘Privacy Invasion’. It allowed you to hack into cameras and covertly watch NPCs interacting with the world. NPC behavior was expressed through brief cut-scenes that showed domestic incidents or quirkier slices of life. ‘Privacy Invasion’ was popular, but limited as the feature lacked gameplay.
In Watch Dogs 2, the goal was to add gameplay to these scenes. The player would be able to hack cameras, computers, and other electronics in the room. NPCs would then react to these events in a way that expressed a narrative, thereby providing a more engaging experience for the player. Our aim was to deliver these missions at cinematic quality – fully motion captured with seamless branching.
We wanted to dramatically improve the immersion of Privacy Invasion missions. Calling it Invasion of Privacy 2.0, our goal was to empower the player by allowing them to affect and control the outcome of these scenes by hacking at any point.
In this article, I will describe some of the technical challenges and design decisions that drove development of the Invasion of Privacy feature in Watch Dogs 2. Areas of focus will include managing branching scenarios, motion capture challenges, controlling NPC state, maintaining dialog flow, and NPC coordination.
An Invasion of Privacy [IOP] mission begins when player hacks into a junction box. The player’s view is put into a camera in the scene, allowing them to view the contents of the room and the NPCs within. The player can look around with the camera, profile the characters and hackables to learn more about them, and switch cameras to get a different view.
Gameplay is advanced by hacking objects in the scene. For instance, the final beat of the ‘Whistleblower’ IOP features a man driven to suicide by a blackmail attempt. You can hack his phone in an attempt to connect him to help. However, if you only hack his phone, the people you contact will literally put him over the edge. If you hack his laptop and find evidence of the blackmail, you can then hack his phone to instead connect him to a journalist and ultimately save his life.
The behavior of each IOP was designed in detail in a mission design document. This was an essential step in communicating the flow, as well as spotting potential failure points. For instance, what is the correct behavior if the phone is hacked while the computer was downloading? In IOPs with heavy branching or multiple simultaneous options, putting the desired flow on paper was a vital step.
Motion Capture vs. Systemic: Early on, we faced a significant decision point. Should we aim for cinematic quality by motion capturing the entire scene including all branches, or should we take a more systemic approach by employing existing walk cycles and object interaction animations? This proved to be a major inflection point for the development of IOPs. I’ve summarized the pros and cons of motion capture in the below table, with the pros and cons of a systemic approach essentially being the exact opposite.
While we knew that systemic animations would be the easiest and cheapest option, there was one compelling reason that made the decision: in many IOPs, the narrative suited having a camera placed immediately in front of the NPC for a close-up shot. At that range, there are no acceptable ways to fake NPC facial movement and lip-sync. Even generic body movements, which are fine at distance, fail to hold up when the NPC is too close and instead come off as being robotic and unnatural.
Ultimately, we chose quality and decided to fully motion capture all IOPs. While this created significant challenges, tackling these allowed us to achieve an excellent outcome. Among other things, this meant that each and every IOP had to be planned out in exacting detail in order to capture all possible branches and combinations. When it came to motion capture day, we had to be 100% ready, with a clear understanding of each individual shot, the role it played in the IOP, and how it flowed in the scene.
Managing Branching: Our target on the gameplay side was to allow the player to branch the scene at any point - full interactivity! While a noble goal, it proved to be impossible for several reasons, which we’ll illustrate through an example. In the ‘Always On’ IOP, a teenage girl is dancing in her room, and the player can turn off the lights and change the music. This interrupts and annoys her, and she races to fix the music and lights before resuming her dance.
The problem is this: what do we do when a hack occurs while she is partway through a movement? In a systemic IOP, we could play the same reaction at any spot and blend from reaction to movement. However, this breaks down when the NPC is close to the destination, because using systemic starts and stops becomes more challenging. Since we are using full motion capture, it is even harder, since the blend seams would be extremely obvious.
The answer is to engage in some subterfuge. Each animation is short, and starts and ends from the same idle pose. By chunking out each animation, we can defer reactions to the start of the next animation. Look again at the dance fail image: notice how the girl reacts quickly, moves quickly, fixes the music quickly, and so on. This short duration was intentional, and allowed us to minimize the maximum delay between a hack and reaction (approximately 1 second at most). The video below shows two rapid hacks, and the girl does both reactions before moving to fix her room.
This approach provides ample reactivity while allowing the player to initiate hacks at any point, but provides well-defined manage branch locations. Indeed, this pose-matching approach formed the cornerstone of our animation approach. Each motion capture clip starts in the pose that matches the end-pose of the branch that took us there. If there are multiple branches that lead to a point, they all have to end in the same pose, and all animations starting at that point have to begin from that pose. While smoothing out the pose-matching took considerable work on the part of the animators, it left us free to smoothly stitch together disparate animations at run-time in the order dictated by the player’s interactions.
Reactions from an idle followed standard gameplay conventions. We kept our idles controlled and limited foot and hip movement. Reactions from idle deliberately involved lots of upper body movement. The movement made it very hard to notice the small blend we applied, and the stationary lower body prevented foot sliding. This made it possible to smoothly branch out of idles, such as the main dance loop in ‘Always On’.
Structuring NPC Behavior: With such a strict requirement on poses and durations, NPC behavior required a clear structure. The intent of this was to provide ample room to design and narrative, allowing them to create a compelling scene without exploding the complexity of the branching and pose matching. We called our structure ‘emotional-escalation’, and it provided a guideline that we used throughout the project.
Each hack would increase the emotional intensity of the scene. For example, if a hack annoyed a character in the scene, each subsequent hack would make the character angrier. It provided predictability for the player, and a clear model for design. In ‘Always On’, the first hack annoyed the girl, the second made her angry, and the final one caused a melt-down. Depending on the scene, there could be interactions between various hacks. For example, we have the following escalation for ‘Always On’:
Each reaction usually consisted of a simple cycle: React -> Restore -> Resume. The NPC would react to the hack (usually with a large reaction that allowed for blending from any pose with the same foot/hip arrangement), restore the state of the scene, and then resume their previous behavior. This could involve movement.
Early on, this structure was useful as it gave us a behavioral design framework. Once we became more comfortable, we became more malleable in our approach. Some hacks would cause a reaction and restore, but the NPC would move to a different base state that advanced the scene. Sometimes the NPC would skip the restore, and so on.
Statefulness in IOPs: In general, stateful animations were a major risk. Imagine an NPC picks up an object, and then as luck would have it, the player triggers a branch at that exact moment. If we allow the branch, then we need to have an animation that includes the object. If the player had hacked a moment earlier, then the NPC might not have picked up the object and so an animation without the object is also needed. This continues down the line – if the player keeps hacking, then the entire rest of the IOP needs to handle that object. The net effect of allowing a state-change divergence is that the amount of motion capture required is effectively doubled.
We found three useful solutions to this challenge:
1. All Roads Lead to Rome: If an NPC undergoes a state change in one path, then all possible paths need to do the same. The player is funneled back to a consistent state. The video below shows the NPC changing state by removing his headset. What we guarantee is that all other paths through the IOP will also result in him removing his headset, leaving the state consistent for the ending.
2. Quick Like a Bunny: The NPC changes state, acts quickly, then goes back to the original state. No Branching is possible during these brief, stateful periods.
3. Noise? What Noise?: We limit the scope of reactions while the state change is active. The narrative is designed so that, during a stateful action, it makes sense for the NPCs to not react to stimuli. In this video, the NPCs are ignoring the mask, so hacks have no effect.
In a typical cinematic, expositional dialog is a core tool in expressing narrative. The exact ordering of the scene (including speech interruptions) is planned out in the script. Since there are no surprises, the writer can easily ensure that all the important narrative beats are hit. The situation is different in IOPs – the writer can no longer make strong assumptions about ordering and narrative flow. This doesn’t obviate the need for a strong narrative, and so we needed to come up with a narrative structure that was resilient to branching.
Dialog was the single largest challenge. For dialog flow to make sense, the writers needed to know when certain beats were hit. A simple solution is that if a dialog line gets interrupted, just replay it to ensure narrative flow. This came off as too ‘video-gamey’ and felt very artificial. Alternatively, we could skip to the next line, but then we risk losing too much context and skipping narrative beats. Instead, we decided to be clever about how narrative was arranged.
Take the following exchange from the ‘Child’s Play’ IOP where one NPC is trying to make a sale:
COLE: Prices are going up, Grizz.
COLE: This is exclusive material I'm providing.
The purpose of this block is to establish that Cole is taking an aggressive bargaining position. The remainder of the scene involves Grizz trying to get the price lower. If the line about prices going up is skipped, then the scene becomes muddled. Our solution to this was to introduce the concept of a Point of No Return (PONR). Each dialog block was given a PONR, set no more than 1.5 seconds after the beginning of the block:
COLE: Prices are going up (PONR), Grizz.
COLE: This is exclusive material I'm providing.
The key concept is that the main narrative beat of the dialog block had to be front loaded and placed before the PONR. After the narrative team performed rewrites to follow that pattern, the functionality becomes straight-forward. If a branch is triggered before the PONR, repeat the entire dialog – the player has not heard enough of it to make the repeat feel artificial. If we branch after the PONR, skip the remainder – the player has heard what they need to hear.
After reactions, we needed to smooth out the rejoin from the reaction to the main dialog. Each reaction ended with a rejoin that was pose matched to the default dialog pose. Included in this was a generic rejoin dialog:
COLE: Lost my fucking train of thought...
GRIZZ: You were talking about bankrupting me...
COLE: Yeah, yeah, yeah...
*resume* current dialog block from PONR or skip to next block
Now, not every IOP expressed narrative through continuous dialog. Interactions that were not tied to NPCs provided an alternate vehicle for narrative. In the ‘Condemned’ IOP, there was a phone that could be hacked to play a message. Once triggered, this would play the message in its entirety, regardless of the other hacks that were taking place. Profile cards were also a good location to place narrative information.
Of course, the sledgehammer approach was to prevent branching entirely. It restricts gameplay and was always the choice of last resort, but by using this sparingly, it allowed certain critical sections of the narrative to be expressed in an uninterruptable format. Mechanically, it meant either locking out the player from triggering a branch, or having the NPCs ignore hacks during that time. We did this during the finale of each IOP, for example, allowing narrative to create proper conclusions to the scene without worrying about branching possibilities.
The final piece of the puzzle was ensuring that all NPCs could react in a coordinated fashion. This means that NPCs needed to react as a group to incoming stimuli, and that animations and positions be correctly synchronized in case of sync animations.
We developed a NPC system that was block-based and event driven. Essentially, this system drove the animations for each NPC, and determined the next appropriate behavior based on the current block and the incoming event.As well, it used a simple blackboard system to track the state of the scene. For example, the emotional escalation system used in many IOPs was tracked on this blackboard. In ‘Always On’, this allowed us to determine if a hack should cause an irritated reaction, an escalated reaction, or trigger the climax. We could then trigger the NPC group to perform the appropriate block. Blocks handled the timing for each animation start and finish, and allowed us to perfectly synchronize NPC behaviors.
As well, NPCs could be placed in single-participant blocks. This allowed us to have individual NPCs react to different hacks on their own timings. A great example of this comes from the ‘Bad Publicity’ IOP. The main character is in his own block, reacting only to the hacks that affect the game he is playing. When an ending is triggered, the police enter the room and synchronization is required. The main character is pulled into the finale block with the police. Importantly, he doesn’t actually terminate his behavior when the room entrance block is started. Instead, he has a short hold on his previous behavior, and only begins his reaction when the door is kicked in. Since he now shares a block with the police, their animations are perfectly synchronized. The video below shows this at time 1:49, where you can see the NPC maintain his behavior until a split second after the police kick in the door.
By trying to achieve a narrative-driven scene with gameplay at a cinematic quality-bar, we forced all aspects of production to explore new ground:
- Animation had to do extensive pose matching and plan challenging motion-capture sessions
- LD had to make detailed mission flow diagrams and script complex branching
- Programming had to ensure correct AI timings and reactions while ensuring synchronization of multiple NPCs. As well, programming had to develop the tools and interfaces to allow production to hook into the system
- Sound had to produce a cinematic quality mix without knowing which camera the player would be in
- Narrative had to produce piecewise scripts that respected the possible branching flows and interruptions
- Production had to learn how to manage and organize all elements of developing this complex feature
For a typical feature, it could have been difficult to push each team to come up with solutions to each of these challenging problems. However, since we had a clear goal with a clear quality-bar, we were able to stay coordinated and focused, allowing the team to take on this wide array of technical challenges. In the end, we were able to deliver a feature that the whole team is proud of.