A Modular Framework for Artificial Intelligence Based on Stimulus Response Directives
While graphics and game physics have shown great progress in the last five years, Artificial Intelligence continues to display only simple repetitive behaviors. In this article, Charles Guy demonstrates his method for modeling AI based on the functional anatomy of the biological nervous system.
November 10, 1999
Author: by Charles Guy
There are three fundamental technologies used in modern computer games: graphics, physics, and artificial intelligence(AI). But while graphics and physics have shown great progress in the last five years, current AI still continues to display only simple repetitive behavior, which is of little replay value. This deficiency, unfortunately, is often sidestepped, and the emphasis switched to multi-player games, which take advantage of real human intelligence.
In this article, I have attempted to model AI based on the functional anatomy of the biological nervous system. In the pure sense of the word, a biological model of AI should use neural networks for all stimulus encoding and motor response signal processing. Unfortunately, neural networks are still difficult to control (for game design) and very computationally expensive. Therefore I have chosen a hybrid model, which uses a "biological" signal path framework in conjunction with more traditional heuristic methods for goal selection. The main features of this model are:
Stimulus detection based on signal strength thresholds.
Target goal selection based on directives, know goals and acquired goals.
Target goals acquired by servo feedback loops that drive the body.
Personalities constructed from sets of directives. Because the directives are modular, it is fairly straightforward to construct a wide range of distinctive personalities. These personalities can display stereotypical behavior while still retaining enough flexibility to exercise "judgment" and adapt to unique situations. The framework of this model should be useful to a wide range of applications because of its generic nature.
Some Background
This AI model was developed for use in SpecOps II, a tactical infantry combat simulation of Green Beret covert missions. While the emphasis of the project has been on realism and squad level tactics, it still falls under the category of a first-person shooter. The original SpecOps project was based on the U.S. Army Ranger Corps. and was one of the first "photo-realistic" tactical combat simulators released for the computer gaming market. The combination of high quality motion capture data, photo digitized texture maps and sound effects recorded from authentic sources produce a rather compelling combat experience. Although the original game was fun to play, it was justifiably criticized for having poor AI. Therefore one of the major goals for SpecOps II was to improve the AI. The previous game logic and AI was based on procedural scripts; the new systems are based on data driven ANSI C code. (My experience has convinced me that data driven code is more reliable, flexible and extensible than procedural scripts.) When the data structures that drive the code are designed correctly, the code itself can become very simple.
Table 1. Parallels to Biological Nervous System
Functional Unit | Biological System |
Stimulus Detection Unit | Visual / Auditory Cortices |
Directives | Reflex / Conditioned Response |
Known / Acquired Goals | Short-Term Memory |
Goal Selector / Navigator | Frontal Cortex |
Direction / Position Goal Servos | Motor Cortex / Cerebellum |
Typical Behavior in SpecOps II
In the course of a normal game, a player can order one of his buddies to attack an enemy. If this enemy is blocked by the world, the path finder will navigate him until there is a clear path. Once the path is clear, the direction servo points the AI at the enemy and he begins firing his gun. If that enemy is taken out, the AI may engage other enemies that were aroused by the weapons fire. If all the known enemies have been taken out, the buddy returns to formation with his commander.
Another typical sequence might begin when a player issues a "demolish position" command to a squad member. The AI will then navigate to the position goal, place a satchel charge and yell out: "fire in the hole!" The "get away from explosive" directive will then cause him to move outside of the danger radius of the explosive. I have observed an interesting case where the initial evasive maneuver lead to a dead end, followed by backtracking towards the explosive object. Eventually the navigator got the AI a safe distance away from the explosive in time.
Overview of Data Flow
The data flow begins with the Stimulus Detection Unit, which filters sound events and visible objects and updates the Known Goals queue. The Goal Selector then compares the Known Goals and Acquired Goals against the personality and commander directives and then selects Target Goals. The navigator determines the best route to get to a position goal using a path finding algorithm. The direction and position goal servos drive the body until the Target Goals are achieved and then the Acquired Goals queue is updated.
Data Structures
The primary data structures used by this brain model are: BRAIN_GOAL and DIRECTIVE. AI personalities are represented by an array of Directive structures and other parameters. The following is a typical personality declaration from SpecOps II:
PERSONALITY_BEGIN( TeammateRifleman )
PERSONALITY_SET_FIRING_RANGE ( 100000.0f ) \\ must be this close to fire gun (mm)
PERSONALITY_SET_FIRING_ANGLE_TOLERANCE( 500.0f ) \\ must point this accurate to fire (mm)
PERSONALITY_SET_RETREAT_DAMAGE_THRESHOLD( 75 ) \\ retreat if damage exceeds this amount (percent)
DIRECTIVES_BEGIN
DIRECTIVE_ADD( TEAMMATE_FIRING_GOAL, AvoidTeammateFire, BaseWeight+1, AvoidTeammateFireDecay )
DIRECTIVE_ADD( EXPLOSIVE_GOAL, GetAwayFromExplosive, BaseWeight+1, NoDecay )
DIRECTIVE_ADD( HUMAN_TAKES_DAMAGE_GOAL, BuddyDamageVocalResponce,BaseWeight, AcquiredGoalDecay )
DIRECTIVE_ADD( DEMOLISH_POSITION_GOAL, DemolishVocalResponce, BaseWeight, AcquiredGoalDecay )
DIRECTIVE_ADD( SEEN_ENEMY_GOAL, StationaryAttackEnemy, BaseWeight-1, SeenEnemyDecayRate )
DIRECTIVE_ADD( HEARD_ENEMY_GOAL, FaceEnemy, BaseWeight-2, HeardEnemyDecayRate )
DIRECTIVE_ADD( UNCONDITIONAL_GOAL, FollowCommander, BaseWeight-3, NoDecay )
DIRECTIVE_ADD( UNCONDITIONAL_GOAL, GoToIdle, BaseWeight-4, NoDecay )
DIRECTIVES_END
PERSONALITY_END
The DIRECTIVE structure contains four fields:
Goal type (Know goals, Acquired goals, Unconditional goals)
Response function pointer (is called if priority weight is best, assigns target goals)
Priority weight (importance of directive)
Decay rate (allows older goals to become less important over time)
The BRAIN_GOAL structure contains all necessary data for object recognition and action response.
The stimulus detection fields are:
Goal type (i.e. seen/heard teammates, seen/heard enemies, heard gun fire, acquired goals)
Goal object pointer (void *, cast to typed pointer based on object type)
Goal position type (i.e. dynamic object position, fixed position, offset position etc.)
Time of detection (timestamp in milliseconds)
Previously known (true/false)
The response fields are:
Action at target (IO_FIRE, IO_USE_INVENTORY etc.)
Yaw velocity (degrees per second)
Movement mode (Forward, Forward slow, Sidestep left, Sidestep right etc.)
Inner radius (navigator threshold)
Outer radius (goal selector threshold)
Time of arrival (timestamp in milliseconds, for acquired goals)
The Stimulus Detection Unit
Modeling stimulus detection in a physical way can achieve symmetry and help fulfill the user's expectations (i.e. if I can see him, he should be able to see me). This also prevents the AI from receiving hidden knowledge and having an unfair advantage. The stimulus detection unit models the signal strength of an event as a distance threshold. For example, the HeardGunFire event can be detected within a distance of 250 meters. This threshold distance can be attenuated by a number of factors. If a stimulus event is detected, it is encoded into a BRAIN_GOAL and added to the known goals queue. This implementation of stimulus detection considers only three sensory modalities: visual, auditory and tactile.
Visual stimulus detection begins by considering all humans and objects within the field of view of the observer (~180 degrees). A scaled distance threshold is then computed based on the size of the object, object illumination, off-axis angle and tangential velocity. If the object is within the scaled distance threshold, a ray cast is performed to determine if the object is not occluded by the world. If all these tests are passed, the object is encoded into a BRAIN_GOAL. For example, a generic human can be encoded into a SeenEnemyGoal or generic object can be encoded into SeenExplosiveGoal .
As sounds occur in the game, they are added to the sound event queue. These sound events contain information about the source object type, position and detection radius. Audio stimulus detection begins by scanning the sound event queue for objects within the distance threshold. This distance threshold can be further reduced by an extinction factor if the ray from the listener to the sound source is blocked by the world. If a sound event is within the scaled distance threshold, it is encoded into a BRAIN_GOAL and sent to the known goals queue.
When the known goals queue is updated with a BRAIN_GOAL, a test is made to determine if it is was previously known. If it was previously known, the matching known goal is updated with a new time of detection and location. Otherwise the oldest known goal is replaced by it. The PREVIOUSLY_KNOWN flag of this known goal is set appropriately for directives that respond to the rising edge of a detection event.
Injuries and collision can generate tactile stimulus detection events. These are added to the acquired goals queue directly. Tactile stimulus events are primarily used for the generation of vocal responses.
The Goal Selector
The goal selector chooses target goals based on stimulus response directives. The grammar for the directives is constructed as a simple IF THEN statement:
IF I detect an object of type X (and priority weight Y is best) THEN call target goal function Z.
The process of goal selection starts by evaluating each active directive for a given personality. The known goals queue or the acquired goals queue is then tested to find a match for this directive object type. If a match is found and the priority weight is the highest in the list, then the target goal function is called. This function can perform additional logic to determine if this BRAIN_GOAL is to be chosen as a target. For example, if the AI is already within the target distance of a BRAIN_GOAL's position, an alternate goal (i.e. direction) could be chosen. Once a target goal is selected, the position, direction and posture goals can be assigned. Unconditional directives do not require a matching object type to be called. These are used for default behavior in the absence of known goals.
The priority weight for a directive can decay at a linear rate based on the age of a known goal (current time minus time of detection). For example, if an AI last saw an enemy 20 seconds ago and the directive has a decay rate of 0.01 units per second, the priority decay is: -2. This decay allows the AI's to lose interest in known goals that haven't been observed for a while.
The goal selector can assign the three target goals (direction, position and posture) orthogonally or in a coupled fashion. In addition to these target goals, the goal selector can also select an inventory item and directly activate audio responses. When a direction goal is assigned, the action at target field can be set. For example, the stationary attack directive sets the action at target field to IO_FIRE. When the direction servo gets within the pointing tolerance threshold, the action is taken (i.e. the gun is fired). When a position goal is selected, an inner and outer radius are set by the directive - the outer radius specifies the distance threshold for the goal selector to acquire, and the inner radius is the distance threshold that the position goal servo uses for completion. The inner and outer radius thresholds are different by a small buffer distance (~250 millimeters), so as to prevent oscillations at the boundary. When a position goal is acquired, the action at target can be evoked. For example, the Demolish Target directive sets the action at goal field to IO_USE_INVENTORY. This directive also selects the satchel explosive from the inventory. Some directives can set the posture goal; for example, the StationaySniperAttack directive sets the posture goal to prone. Likewise the HitTheDirt directive sets the posture goal to prone.
The Navigator
Once a position goal has been selected, the navigator must find a path to get there. The navigator first determines if the target can be acquired directly (i.e. can I walk straight to it?). My initial implementation of this test used a ray cast from the current location to the target location. If the ray was blocked, then the target was not directly accessible. The ray cast method has two problems:
Where an intervening drop off or obstacle did not block the ray and
where the ray is blocked by smoothly curving slopes that can be walked over.
My final solution for obstacle detection uses a step-wise walk-through. Each step (~500 millimeters) along the path to the target is tested for obstacles and drop offs. This method produces reliable obstacle detection and is a good basis for navigation through a world composed of triangles.
If a position goal is not blocked by the world, the position goal servo goes directly to the target. Otherwise a path finding algorithm is used to find an alternate route to get to the target position. The path finding algorithm that is used in SpecOps II is based on Navigation Helper Nodes that are placed in the world by the game designers. These nodes are placed at the junctions of doors, hallways, stairs and boundary points of obstacles. There are typically a few hundred Navigation Helper Nodes per level.
The first step in the path finding process is to update the known goals queue with all Navigation Helper Nodes that are not blocked by the world. Because the step-wise walk through obstacle test is fairly time expensive, it is distributed over a number of frame intervals. Once the know goals queue been updated with all valid navigation helper nodes, the next position goal can be selected. This selection is based on when the Navigation Helper was last visited and how close it is to the target position. When a Navigation Helper Node is acquired by the position goal servo, it is updated in the acquired goals queue with the time of arrival. By only selecting Navigation Helper Nodes that have not been visited, or that have the oldest time of arrival, ensures that the path finder will exhaustively scan all nodes until the target can be reached directly. When two Navigation Helper Nodes have the same age status, the one closer to the target position is selected.
Direction and Position Goal Servos
The direction and position goal servos take an X, Y, Z position as their goal. This position is transformed into local coordinates by translation and rotation. The direction servo drives the local X component to 0 by applying the appropriate yaw velocity. The local Y component is driven to 0 by applying the appropriate pitch velocity. When the magnitude of the local X, Y coordinates goes below the target threshold, the goal is "acquired". The position goal servo is nested within a direction servo. When the direction servo is pointing at the goal to within the desired tolerance, the AI approaches the target using the movement mode (i.e. IO_FORWARD, IO_FORWARD_SLOW) set by the directive. Once the distance to the position goal falls below the inner radius, the goal is "acquired", actions at goal can be evoked and the acquired goals queue is updated. The acquired goals queue is used as a form of feedback loop to tell the goal selector when certain goals are completed. This allows the goal selector to step through a sequence of actions (i.e. state machine).
Brain/Body Interface
Most actions are communicated to the body through a 128 bit virtual keyboard called the action flags. These flags correspond directly to keys the player can press to control his avatar. Each action has an enumerated type for each bit mask (i.e. IO_FIRE, IO_FORWARD, IO_POSTURE_UP, IO_USE_INVENTORY etc.) These action flags are then encoded into animation states. Because the body is articulated, rotation is controlled by separate scalar fields for body yaw velocity, chest yaw angle, bicep pitch angle and head yaw/pitch angle. These allow for partially orthogonal direction goals (i.e. the head and gun can track an enemy while the body is pointing at a different position goal).
Commands
Because of their modular nature, directives can be given to an AI by a commander at runtime. Each brain has a special slot for a commander directive and a commander goal. This allows the commander to tell one of his buddies to attack an enemy that is only visible to himself. Commands can be given to a whole squad or to an individual. Note that is very easy to create directives for commander AI's to issue commands to their teammates. The following is a list of commander directives used in SpecOps II:
TypeDirective CommanderDirectiveFormation ={ TEAMMATE_GOAL, GoBackToFormation, BaseWeight, NoDecay};
TypeDirective CommanderDirectiveHitTheDirt={ POSTURE_GOAL, HitTheDirt, BaseWeight+1,NoDecay};
TypeDirective CommanderDirectiveAttack = { SEEN_ENEMY_GOAL, ApproachAttackEnemy,BaseWeight, NoDecay};
TypeDirective CommanderDirectiveDefend = { FIXED_POSITION_GOAL, DefendPosition, BaseWeight, NoDecay};
TypeDirective CommanderDirectiveDemolish = { DEMOLISH_POSITION_GOAL, DemolishPosition, BaseWeight, NoDecay};
Future Improvements
Because this brain model is almost entirely data driven, it would be fairly easy to have it learn from experience. For example, the priority weights for each directive could be modified as a response to victories or defeats. Alternatively, an instructor could punish (reduce directive priority weight) or reward (increase directive priority weight) responses to in-game events. The real problem with teaching an AI during game play is the extremely short life span (10-60 seconds). However, each personality could have a persistent communal brain, which could learn over the course of many lives. In my opinion, the real value of dynamic learning in game AI is not to make a stronger opponent, but to make a continuously changing opponent. It is easy to make an unbeatable AI opponent; the real goal is to create AIs that have distinctive personalities, and these personalities should evolve over time.
Read more about:
FeaturesYou May Also Like