I am a robotics researcher and hobbyist game developer. My specialization is in high dimensional motion planning for robot arms. Motion planning is a highly relevant topic for game developers, as it’s needed every time you want to move an AI controlled character from one place to another.
In the course of learning game development, I’ve come across many tutorials which discuss motion planning (usually called "Path Finding" in game development literature), but most of these do not go into great detail about what motion planning is all about from a theoretical perspective. As far as I can tell, most games rarely use any motion planning beyond these big three algorithms: A* Grid Search, Visibility Graphs, and Flow Fields. There is a whole world of theoretical motion planning research beyond these three, and some of it could be useful to game developers.
In this article I wanted to discuss the common techniques of motion planning in context, and explain some of the advantages and disadvantages of each. I also want to present some of the basic techniques beyond those commonly used in video games, and hopefully shed light on how these might be useful in game development.
Moving from Point A to Point B: Agents and Goals
Let’s start by assuming we have an AI-controlled character in a game, such as an enemy in a shooter, or a unit in a strategy game. We will call this character the Agent. The Agent is at a particular place in the world, called its “configuration.” In two dimensions, the configuration of the Agent can be represented by just two numbers (typically, x and y). The Agent wants to get to some other place in the world, which we will call the Goal.
Let’s assume for now that the Agent is a circle, and that he lives in a flat, two dimensional world. Assume the Agent can move in any direction he pleases. If there is nothing but empty space between the Agent and his Goal, the Agent can just move in a straight line from his current configuration to the Goal. When he arrives at the Goal, he stops. We will call this the “Walk To Algorithm.”
Algorithm: WALK TO
If not at goal:
Move toward goal.
If the world is completely empty, this is perfectly fine. It may seem obvious that we need to move in a straight line from the start configuration to the Goal configuration, but consider that the agent could have moved in a totally different way to get from its current configuration to the Goal. He could have looped around before going to the Goal, or perhaps walked fifty miles away from the Goal before finally coming back and landing on the Goal.
So why do we think the straight line is obvious? Well, because in some sense, it’s the “best” way. If we assume the Agent can only move at a fixed speed in any direction, the straight line is the shortest and fastest way for him to get from one point to another. In this sense, the Walk To Algorithm is Optimal. We will discuss optimality a lot in this article. Basically, when we say an algorithm is optimal, we are saying that it “does the best it can” given a certain definition of “best.” If you want to get from point A to point B as fast as possible, and there’s nothing in the way, you can’t beat a straight line.
And indeed, in many games (I would say, even most games), the straight-line Walk To Algorithm will work perfectly fine for you. If all you have are little 2D enemies that live in hallways or rooms, and who don’t need to search through mazes or follow orders from the player, you’ll never need anything more complicated than this.
But what if there is something in the way?
In this case, the thing in the way (which is called an Obstacle) blocks the path of the agent to the Goal. Since the agent can no longer move towards the goal, he simply stops. What has happened here?
While the Walk To Algorithm is Optimal, it is not Complete. A “Complete” algorithm is able to solve all solvable problems in its domain within finite time. In our case, the domain is circles moving around on a plane with obstacles. Clearly, just going straight toward the Goal blindly will not work for all problems in this domain.
In order to solve all of the problems in this domain, and solve them optimally, we will need to do something a little more sophisticated.
Reactive Planners: The Bug Algorithm
What would you do in the presence of an obstacle? When you need to get to some goal (like say, the front door) and something is in the way (like say, a chair), you simply walk around it and continue on toward your goal. This sort of behavior is called “Obstacle Avoidance,” and the idea can be used to solve our problem of getting the Agent to his goal.
One of the simplest obstacle avoidance algorithms is called the Bug Algorithm. It works like this:
Draw a straight line from the Agent to the Goal. Call this the M-Line.
The Agent takes a single step along the M-Line until one of the following conditions is met:
If the agent reaches the goal, stop.
If the agent touches an obstacle, follow the obstacle clockwise until we reach the M-Line again, and also if we are closer to the goal than we were initially upon reaching the M-line. When we reach the M-Line, continue from step 2.
Step 2.2 needs some explanation. Basically, the Agent remembers all of the times its encountered the M-line for a given obstacle, and only leaves the obstacle if it is on the M-line, and is closer to the goal than any time before. This is to avoid the Agent getting into an infinite loop.
In the figure above, the dotted line is the M-Line, and the orange arrows represent the steps our Agent takes before reaching the Goal. This is called the Agent’s Trajectory. Amazingly, the Bug Algorithm solves many problems of this type very easily. Try drawing a bunch of crazy shapes, and you will see how well it works! But does the Bug Algorithm work for every problem? And why does it work?
The simple Bug Algorithm has very many interesting features; and demonstrates many of the key concepts we’ll keep coming back to in this article. These concepts are: Trajectory, Policy, Heuristic, Completeness, and Optimality.
We already mentioned the concept of the Trajectory, which is just a sequence of configurations or movements that the Agent takes along its path. The Bug Algorithm could be considered a Trajectory Planner, which is an algorithm which takes in a start configuration and a goal configuration, and returns a Trajectory from the start to the goal. (This is also sometimes called a "path planner" in game development literature).
But we should also mention the concept of a Policy. If instead of interpreting the Bug Algorithm as returning a full Trajectory from the start and goal, we interpret it as giving a set of instructions to Agent which allow it to get to the Goal from any location, just using its own configuration as a guide, then the Bug Algorithm could be said to be a Control Policy. A control policy is just a set of rules which, given a state of the Agent, determines what action the Agent should take next.
In the Bug Algorithm, the M-Line is a strong indicator of where the Agent should go next. Why did we choose the M-line to follow, and not, say, the line from the current Agent position to the goal? The answer is that the M-line is just a Heuristic. A heuristic is just a general rule we use to inform the next step of an algorithm that imparts some human knowledge on the algorithm. Choosing a heuristic is often an important part of developing such an algorithm. Choosing the wrong heuristic can result in the algorithm failing to work altogether.
Believe it or not (and I did not believe it at first), the Bug Algorithm solves all solvable problems in the domain of two dimensional circles moving from a start to a goal position. That means, no matter where you put the Agent or the Goal, if the Agent can reach the Goal, the Bug Algorithm will allow him to do so. The proof of this fact is beyond the scope of this article, but you can read it if you wish. If an Algorithm can solve all problems in its domain for which there is a solution, it is called a Complete Algorithm.
Even though the Bug Algorithm can solve all problems of this type, the trajectories it produces are not always the best possible trajectories the Agent could have taken to get to the goal. Indeed, the Bug Algorithm will cause the Agent to do some pretty stupid things from time to time (just try making an obstacle which is tiny in the clockwise direction, and billions of miles long in the other direction, and you will see my point. If we define “best” as meaning “taking the shortest time,” then the Bug Algorithm is in no way Optimal.
Optimal Planning: Visibility Graph
So what if we wanted to solve the problem of moving around circles among obstacles in an Optimal (shortest) way? To do this, we can exploit geometry to figure out what an optimal algorithm might look like for this problem. First, note the following facts:
- In two dimensional Euclidean planes, the shortest trajectory between two points is a straight line.
- The length of a sequence of straight line trajectories is the sum of the lengths of each line.
- Since a polygon is made up of straight lines (or edges), the shortest path which completely circumnavigates a convex polygon contains all of the polygon’s edges.
- To circumnavigate a non-convex polygon, the shortest path contains all of the edges of the polygons’ convex hull.
- A point exactly on the edge of a polygon collides with that polygon.
The definitions of “convexity,” “non-convexity,” and “convex hull” are not really important in understanding this article, but these definitions are all we need to know to construct an optimal trajectory planner. Let’s make the following assumptions:
- Our Agent is a two dimensional point that is infinitely small.
- All obstacles in the world can be approximated by closed polygons.
- Our Agent can move in any direction along a straight line.
Then, we can find an Optimal plan from the Agent to his goal by constructing what is called the Visibility Graph of the world. Besides the geometry, we also need to know a little bit about Graphs. Graph theory is not the topic of this article, but basically Graphs are just groups of abstract objects (called Nodes), linked to other Nodes by abstract objects called Edges. We can use the power of Graphs to describe the world the Agent lives in. We’ll say a Node is any place an agent can be without colliding with something. An Edge is any Trajectory between nodes that the agent can travel along.
With those ideas in mind, we define the Visibility Graph Planner in the following way:
Algorithm: VISIBILITY GRAPH PLANNER
Create an empty Graph called G.
Add the Agent as a vertex in G.
Add the Goal as a vertex in G.
Place all of the vertices of all of the polygons in the world on a list.
Connect all of the polygon’s vertices in G according to their edges.
For each vertex V on the list:
Try to connect V to the Agent and the Goal. If the straight line between them does not intersect any polygon, add this line as an edge in G.
For every other vertex in every other polygon, try to connect it to V using a straight line. If the line does not intersect any polygon, add the line as an edge in G.
Now, run a Graph Planner (like Dijkstra’s or A*) on the Visibility graph to find the shortest path from the Agent to the Goal.
The visibility graph looks like this: (dotted lines)
And the resulting path looks like this:
Because of our assumptions that the agent is a perfectly small point, and that it can travel in straight lines from any location to any other, we are left with a trajectory that perfectly passes through corners, edges, and free space to exactly minimize the amount of distance the Agent has to travel to the goal.
But what if our Agent isn’t a point? Notice how the orange path takes the Agent between the trapezoid and the rectangle. The Agent clearly can’t fit through that space, so we’ve got to do something about this problem.
One solution is to assume that the Agent is a disk rather than a point. If this is the case, we can simply inflate all of the obstacles by the radius of the Agent, and then plan as if the Agent were a point in this new, inflated world.
In this case, the Agent instead takes the path around the left of the rectangle rather than the right, since it simply can’t fit through the space between the trapezoid and the rectangle. This “inflating” operation is actually incredibly important. In essence, we’ve changed the world so that our assumption that the Agent is a point is true.
This operation is called computing the Configuration Space of the world. The Configuration Space (or C-Space) is a coordinate system whereby our Agent’s configuration is represented by a single point. An obstacle in the Work Space is transformed into an obstacle in the Configuration Space by first putting the Agent at the specified configuration and then testing the agent for collision. If the agent collides with an obstacle in the Work Space at that configuration, then there is also an obstacle in the Configuration Space. If we have a non-circular agent, we can use what is called the Minkowski Sum to inflate obstacles.
This might sound tedious for two dimensional cases, but that is really what we’re doing when we “inflate” the obstacles. The new “inflated” space is actually the Configuration Space. In higher dimensional spaces, things will not be quite so easy!
The Visibility Graph Planner is pretty cool. It’s both Optimal, and Complete, and it does not rely on any Heuristic. Another major advantage of the Visibility Graph Planner is that it is reasonably Multi-Query. A Multi-Query planner is one which can re-use old computations when going to new goals. In the case of the Visibility Graph, we only need to rebuild the edges of the graph from the Agent and the Goal whenever the Goal or Agent is unexpectedly moved. These advantages make the planner extremely attractive for game programmers.
Indeed, many modern games use Visibility Graphs for planning. A popular variant of the Visibility Graph idea is sometimes called a Nav(igation) Mesh. Nav Meshes sometimes use Visbility Graphs as a base (along with other heuristics for distance to enemies, items, and so on). Nav Meshes can then be modified by a designer or programmer. Each Nav Mesh is then stored as a file and used for a particular level.
Here, I have a screenshot of a Nav Mesh used in a modern video game: Overlord
For all their advantages, Visibility Graph methods have significant disadvantages. For one, it can be a fairly costly operation to compute the visibility graph. As the number of obstacles increases, the size of the Visibility Graph increases dramatically. The cost of computing the visibility graph increases as the square of the number of vertices. Additionally, if any obstacles change, it’s likely that the entire Visibility Graph needs to be recomputed in order to remain optimal.
Game programmers have invented all sorts of tricks to speed up graph construction by using Heuristics on which nodes to attempt to connect via an edge. Additionally, game designers often give hints to the Visibility Graph algorithm to determine which nodes to connect beforehand. All of these tricks tend to make Visibility Graphs no longer optimal, and sometimes involve tedious programmer intervention.
However, the biggest problem with Visibility Graphs (in my opinion) is that they simply don’t generalize well. They solve one problem, and solve it incredibly well: planning in the 2D plane with polygon obstacles. What if you’re not on the 2D plane? What if your agent can’t be easily represented by a circle? What if you don’t know what the obstacles are, or they can’t be represented by polygons? Well, then you could not easily use Visibility Graphs to solve your problem.
Luckily, there are other methods which have a more “fire and forget” feel to them, and which address some of these issues.
Lattice Grid Search: A* and Variants
The reason Visibility Graphs work is that they exploit the optimizing qualities of abstract graph search, along with the fundamental rules of Euclidean geometry. The Euclidean geometry part gives you the “fundamentally right” graph to search. The abstract search itself takes care of the optimization part.
So what if we abandoned the Euclidean geometry altogether and just did the entire problem using abstract graph search? That way, we’d be able to cut out the middle man of geometry, and solve many more kinds of problems.
The problem is, there are many different approaches we could take to “massage” our problem into one of graph search. What are our nodes? What exactly are our edges? What does it mean for one node to be connected to another? The way we answer these questions will dramatically alter the performance of our algorithm.
Even if a graph search algorithm gives you the “optimal” answer, it is only giving you the “optimal” answer in terms of its own internal structure. This does not imply that the “optimal” answer according to the graph search algorithm is the answer that you want.
With that in mind, let’s go ahead and define the graph in the most common way: the Grid Lattice. Here’s how it works:
Algorithm: DISCRETIZE SPACE
Assume the Configuration Space has some fixed size in all its dimensions.
Discretize each dimension so that it has a fixed number of cells.
For each cell whose center is inside an obstacle in the Configuration Space, mark it Impassable.
Likewise, for each cell whose center is outside an obstacle, mark it Passable.
Each Passable cell is now a Node.
Each Node connects to all its “adjacent” Passable neighbors in the graph.
“Adjacent” is another definition we have to worry about. Two of the most common definitions are “Any cell which shares at least one corner with the current cell” (this is called “Euclidean Adjacency”) or “Any cell which shares at least one face with the current cell” (this is called “Manhattan Adjaceny”). Depending on which one you choose, the answer the graph search algorithm will return is going to be entirely different.
Here’s what the Discretization step looks like in our toy example:
If we search this graph according to the Euclidean Adjacency metric, the path we get will look something like this:
There are many different Graph search algorithms you could use to solve the lattice grid search problem, but by far the most common one is called A*. A* is a relative of Dijkstra’s algorithm which searches nodes forward from the starting node using a Heuristic. A* is extremely well explored in a number of other articles and tutorials, so I am not going to explain it here.
A* and other lattice grid graph search methods are probably the most common planning algorithms in games, and the discretized lattice grid is probably the most common way of structuring A*. For many games, this search method is ideal, since so many games use either tiles or voxels to represent the world anyway.
Graph search methods over a discretized grid (including A*) are Optimal according to their internal structure. They are also Resolution Complete. This is a weaker form of completeness which states that as the fineness of the lattice grid approaches infinity (ie, as the cells get infinitely small), the algorithm solves more and more of all solvable problems.
In this particular example, the lattice grid search method is Single-Query rather than multi-query, since the graph search cannot usually re-use old information when generalizing to new goal and start configurations. The discretization step, however, can naturally be re-used.
Another key advantage (and in my opinion the main advantage) of graph-based methods is that they are purely abstract. That means additional costs (such as safety, smoothness, nearness to desired objects, etc.) can automatically be considered and optimized. Additionally, totally different problems can use the same underlying abstract code. In DwarfCorp, we use the same A* planner for motion planning, and Symbolic Action Planning (which is about representing actions Agents can take as an abstract graph).
However, Lattice Grid graph search has at least one serious disadvantage: It is incredibly memory intensive. If every node of the grid is built naively from the start, you will very quickly run out of memory as the size of the world increases. A large part of this problem is the fact that the lattice grid has to store vast amounts of free space in its grid. Optimization techniques exist to alleviate this problem, but fundamentally, as the size of the world increases, and the dimension of the problem increases, the amount of memory in the lattice grid blows up enormously. This makes is unsuitable for many more complicated problems.
Another serious disadvantage is that the graph search itself can take quite a long time. If you have many thousands of agents moving around, or many obstacles changing positions, lattice grid searching may not work for you. In DwarfCorp, we have several threads dedicated to planning. It eats up a lot of our CPU time. It might do so for you too!
Control Policies: Potential Functions and Flow Fields
Another way of approaching the motion planning problem is to stop thinking about it in terms of Trajectory Planning, and start thinking about it in terms of Control Polices. Remember when we said that the Bug Algorithm could be thought of as both a Trajectory Planner and a Control Policy? In that paragraph, we defined a Policy as a set of rules which take in a Configuration, and return an Action (or a “Control Input”).
The Bug Algorithm can be thought of as a simple Control Policy which just tells the Agent to move toward the Goal until hitting an obstacle, and then to follow the obstacle clockwise. The Agent could literally follow these rules as it moves along its path to the goal.
The key advantages of Control Policies is that they generally don’t depend on the Agent having anything more than local knowledge of the world, and that they are generally extremely fast to compute. Many thousands (or millions) of Agents can all follow the same control policy in parallel quite easily.
So can we think of any control policies which work better than the Bug Algorithm? One useful one to look at is called the “Potential Field Policy.” It works by simulating the Agent as a charged particle which is attracted to the Goal, and repelled by obstacles:
Algorithm: POTENTIAL FIELD POLICY
Define real number constants a and b
Find the vector in the configuration space from the Agent to the Goal and call it g.
Find the nearest obstacle point in the configuration space, call it o.
Find the vector from to the Agent and call it r.
The control input u = a * g^ + B * r^ where v^ refers to the normalized vector v.
Move the agent according to the control input.
This policy requires some knowledge of Linear Algebra. Basically, the control input is the weighted sum of two terms: the Attractive term, and the Repulsive term. The choice of the weights on each term dramatically influence the resulting trajectory that the Agent takes.
Using the Potential Field Policy, you can get your agents to move in incredibly realistic and smooth ways. Additional terms (such as nearness to a desired object, or distance from an enemy) can be added in easily.
Since the Potential Field Policy can be computed extremely quickly in parallel, thousands upon thousands of agents can be controlled in this way very efficiently. Sometimes, this control policy is computed offline and stored in a grid for each level. It can then be modified by a designer as he or she sees fit (then, it is usually called a Flow Field).
Some games (especially strategy games) use this to great effect. I’ve attached an image of flow fields being used in the strategy game Supreme Commander 2 to get units to naturally avoid each other and stay in formation:
Of course, flow fields and potential field functions have serious disadvantages. First, they are in no way Optimal. Agents will be blown about by the flow field without any regard to how long they are taking to get to the goal. Secondly (and in my opinion most importantly) they are in no way Complete. What if the control input cancelled out to zero before the agent reached the goal? In that case, we would say that the agent has reached a “Local Minimum”. You might think that such cases are rare, but in fact they are quite easy to construct. Just put a big U-shaped obstacle in front of the Agent.
Finally, flow fields and other control policies are notoriously difficult to tune. How do we choose the weights on each term? Ultimately, these need to be hand-tuned to get good results.
Typically, designers will have to go in and hand-tune flow fields to avoid local minima. These problems limit the potential usefulness of flow fields in the general case. Nevertheless, they are useful in many scenarios.
The Configuration Space and the Curse of Dimensionality
So now we’ve discussed the three most popular motion planning algorithms in video games: lattice grids, visibility graphs, and flow fields. These algorithms are extremely useful, very easy to implement, and quite well studied. For 2D problems with circular Agents moving around in all directions in the plane, they work beautifully. Luckily, that covers almost all problems in video games, and the rest is faked by clever tricks and a lot of hard work on the designer’s part. It is no wonder that they are used so extensively in the game industry.
So what is it that researchers in motion planning have been doing for the past few decades? The answer is: everything else. What if your Agent isn’t a circle? What if he doesn’t live on the plane? What if he can’t move in every direction? What if the goals, or the obstacles, are constantly changing? Well, then the answers aren’t so simple.
Let’s start with a very simple case that seems like it should be easy to solve using the same methods we just described, but which in fact can’t be solved by those methods without seriously modifying them.
What if instead of being a circle, our Agent is a rectangle? What if the rotation of the Agent matters? Here’s a picture of what I’m describing:
In the above case, the Agent (in red) is sort of like a shopping cart that can move in any direction. We want to move the agent so that it gets exactly on top of the goal, and is rotated in the correct direction.
Here, we could presumably use the Bug Algorithm, but we’d