This post was co-authored by Wendelin Reich and Werner Schirmer. Together with Sophie Peseux we're founders of Virtual Beings, an artificial behavior-startup that develops mobile games with deeply interactive non-player characters (NPCs). See here for posts #2, #3 and #4.
There's a delightful little scene in the science-fiction movie Her (2013) where Theodore, the protagonist, plays a video game in augmented reality. At one point, an NPC turns towards him and starts insulting him. This sudden display of real personality from an otherwise bland character is so unexpected that it makes Theodore laugh as well as think. He realizes this behavior is a puzzle. By insulting the NPC back, he ends up solving it and the game continues.
Eight years later, the state of the art in interactive characters still doesn't provide anything close to Theodore's experience. Among professionals and players alike, there's a strong consensus that character AI hasn't seen qualitative breakthroughs since about 2005, the year F.E.A.R. was released. Even worse, recent textbooks on game AI state explicitly that innovation in character AI has essentially come to a halt, the field nowadays being more interested in AI-driven art production, system-level AI and so on.
A clear majority of popular games today features NPCs of some kind. We therefore believe that this dearth of innovation in character AI is both a creative bottleneck for future games and an immense opportunity for folks who are willing to approach the problem with a fresh look. This first article of our four-part series on the future of interactive characters therefore starts at what we see as the logical beginning. If the purpose of artificial agents such as NPCs is to behave in ways that engage players, we need to ask ourselves just what agent behavior is in the first place (where 'agent' refers to both animals and humans). Our academic roots are in both psychology and artificial behavior (AB). Over many years of development on Rascal, our AB-engine, we have found that behavior is characterized by twelve properties. We call them 'principles' as a small homage to Disney's famous twelve basic principles of animation.
Some of them are obvious, others less so. What's important is that these twelve principles, taken together, sharply delimitate behavior from anything that resembles it without matching it. And more importantly: If you would like to create an AB-engine which allows interactive characters to feel 'alive', you'd want to make sure that it supports all twelve.
Principle 1: Behavior is observable
At first sight, behavior seems to be about muscles that move. Let's say you're sitting in a fancy restaurant, waiting for your date to show up. Your fingers are nervously tapping on the table and your heart (also a muscle) is racing.
Does that mean you're doing two things at the same time here? Not quite. Living bodies are full of complicated stuff doing complicated things, but most of this isn't perceivable from the outside. For our purposes, behavior includes only events that are observable without special instruments (such as an MRI scanner). So if your racing heart contributes to your overall nervousness and you end up knocking over your glass of orange juice and ruining your shirt - that would be observable, hence behavior.
For AB, this first principle entails a welcome simplification: We don't have to try to recreate life itself, just its appearance. Disney called this 'the illusion of life'. We'll go one step further and call it the illusion of interactive life - something we'll cover in a later blog post.
Principle 2: Behavior is continuous
Living beings behave all the time, from birth all the way to their death. Our language recognizes this by providing us with an arsenal of terms we can apply to someone who isn't showing any movement or making any audible sound. For example, we may say that this person is sleeping, sitting still, holding their breath, playing dead, and so on.
Doesn't this conflict with principle 1? No, because even when an agent is seemingly doing nothing, we can observe something: In the GIF on the left, you can tell effortlessly that sitting perfectly still under a shower of balloons is a skilled (and probably rehearsed) display of behavior. The mere act of sitting straight requires coordinated use of dozens of muscles. In a more general vein, we may say that agents emit continuous behavior streams. The problem of AB is thus to generate such streams from individual behaviors that are connected to preceding and subsequent behavior.
Principle 3: Behavior is interactive
There is no real life behavior that is not interactive. For example, playing with a friend involves responding to their actions, and climbing a rock requires adapting one’s hands to its shape. Even the most self-involved behavior takes place in a context and needs to interact with it. Take breathing as an example, where the respiration rate depends (among other things) on the density of oxygen in the atmosphere. If we take away the context (oxygen), the behavior (breathing) ceases to make sense.
Behavior is how agents relate to the world, and that is why all behavior needs to be interactive. This also means that there is no difference between behavior that is interactive, adaptive or responsive - these words just add different flavors to the fact that behavior is necessarily contextual. For AB, this means that all behavior needs to be procedurally generated - which is unfortunately the exact opposite of what happens in most games today, which instead tend to assemble behavior streams from canned packages of pre-configured behavior: stand-loop, walk-loop, jump and so on, with awkward transitions between them.
Principle 4: Behavior is constrained
Context imposes lots of constraints on behavior, in the form of conditions that shape it in various ways. By the far the most important one is the physical makeup of the world - the resistance it offers to the agent's body, the way it allows sound to propagate, and more.
Constraints can be passive or active themselves, thereby directing an agent's behavior dynamically and somewhat unpredictably. AB must hence go beyond mere procedural selection of behavior and offer full-fledged support for procedural animation, allowing the behavior stream to adapt to constraints on the fly.
Principle 5: Behavior is sequenced
AI textbooks often distinguish 'scripted' from 'unscripted' behavior, implying that the latter is somehow better and more organic. This seems a bit pointless to me, because real agent behavior is always a combination of both. In fact, our brains have dedicated circuitry (notably the cerebellum) to store gigantic databases of parametric motion sequences.
These sequences make it much easier for the brain to deploy standard forms of behavior. At the same time, such sequences are highly adaptable to concrete environments and dynamic context. This makes for a powerful combination. Instead of having to decide freshly each time exactly which muscles to move, when, and how much, to produce, say, a tango, it can use templates that leave only a few parameters to be filled in at 'runtime', so to speak. Apart from reducing complexity, this approach also facilitates synchronization of behavior between several individuals, and it explains in part why real behavior can sometimes feel scripted. Modern AB engines such as Rascal take their inspiration from neuroscience and incorporate parametric, adaptive sequencing into their architecture.
Principle 6: Behavior is interruptible
Even the most perfectly planned behaviors won't always survive first contact with reality. If they do, agents change their mind all the time and their behaviors will have to follow suit. This is an almost trivial observation about the real world but a hard challenge for AB, mostly because of principle 2. Interruptions can't just break off the behavior stream and start a fresh one.
The requirements for continuity and for rapid interruptibility pull in opposite directions, creating a tension that even an athlete like LeBron James can't always resolve gracefully. AB engines are faced with the added challenge that such a perceived lack of control may be precisely what the user of the engine wants to achieve (e.g., for comic effect). Rascal achieves this via a layered control architecture that's inspired by robotics - something we might discuss in a future post.
Principle 7: Behavior shows patterned variation
You cannot step twice into the same river, and you cannot display twice the same behavior. Some difference, however small, will always persist - and that's part of what makes natural behavior, well, natural.
Importantly, these different expressions of one and the 'same' behavior tend to be both random and structured. The Weasley twins may hold their heads and open their lips in slightly different ways when they ask 'What?', but they cannot go so far as to, say, close their mouth when it needs to be open, or vice versa. Evolutionary biologists call this phenomenon patterned variation. Whenever it's found, it indicates that the variations are due to underlying generative principles or rules - for example, rules governing how the vocal apparatus can produce the word 'what'. That doesn't mean that AB engines need to simulate (say) an entire vocal apparatus to produce believable variations. In practice, the dimensionality of possible variations is often limited and can be approximated in more superficial ways.
Principle 8: Behavior is hierarchical
The closer we look at an agent's body while it's displaying behavior, the more we see that several things usually occur at once. This and the following principle help to establish some order here.
Let's start with the observation that from a kinematic point of view, behavior is almost always hierarchically organized. A handshake illustrates this nicely. Despite its name, this little ritual involves coordination of many body parts that are in hierarchical relationships, where subordinate parts are affected by superior ones.
In the GIF on the left it all starts with the torso, which positions the arms (which are subordinate to the torso) and leans forward during the shake. Meanwhile, the head (which also depends on the torso) orients towards the other party and the eyes (which depend on the head) need to look downwards initially to coordinate the initial grip. They then look up and connect with those of the other.
Once we start looking for hierarchies in behavior, we find them everywhere, and to make things worse, they evolve rapidly over time (recall principle 5). The consequences for AB are significant, but (fortunately) identical to those of the next principle.
Principle 9: Behavior is parallel
What is Peggy Olson from 'Mad Men' doing? She is walking. She is smoking. The fact that there are (at least) two perfectly good answers bothers no one because it's normal to do several things in parallel.
These things don't even have to be in a hierarchical relationship (litmus test: you can smoke without walking, and vice versa). Still, the consequences of principles 8 and 9 for AB are identical. They entail that the behavior stream must be composed from multiple sub-behaviors that can be hierarchically organized. As an added complication, these sub-behaviors can control distinct or overlapping motor domains of the body (eyes, mouth, limbs, ...). For an example of distinct domains, look no further than Peggy, who's smoking behavior doesn't interfere at all with her walk. For overlapping domains, imagine that Peggy were walking as well as shaking from fear - two behaviors that will effect the same body parts, but in distinctive and potentially complex ways.
Principles 10-12: Behavior is cognitively caused, monitored, and readable
The final three principles can be discussed together for the purposes of this overview, as they are about the relationship between behavior and cognition.
The things that emit behavior (i.e., agents) are also the things that have central nervous systems which control their behavior. And the things that see this behavior (i.e., other agents) also automatically interpret this behavior. We have been hardwired by evolution to 'read' (unobservable) cognitive causes into observable behavior and thereby give it intentionality and meaning.
Thus, in the GIF above, you don't just see an anchorwoman who is lifting and then lowering her arm - you see a lady who is trying to high-five her colleague, failing to solicit her attention, and ultimately ashamed about her failure. Tons of psychological studies have shown that such attributions are automatic and irrepressible. For AB, this implies that it's impossible to separate the behavior emitted by artificial agents from the meaning it elicits. Behaviors always express something, whether you want it to or not. AB engine development is therefore not just an engineering challenge, but also (and foremost) a psychological one. It's about convincing the player of the (artificial) meaningfulness of generated behavior, which is a topic we plan to talk about in several future posts.
We hope that this overview has given you a greater appreciation of the sheer complexity of behavior in the real world and the challenges of translating it into artificial behavior. But if you think about it, behavior is really all we have to connect with our fellow creatures, to understand them and be understood by them in turn. That's why we're passionate about it, and why we want to share more of our R&D with you in subsequent posts.