# Using Particle Swarm Optimization for Offline Training in a Racing Game

Etienne de Villiers discusses how he used the Particle Swarm Optimization algorithm to arrive at good values for the physics and AI parameters of his racing game.

Etienne de Villiers

December 13, 2005

Introduction

Many contemporary games use a data-driven approach to control various aspects of a game. In a racing game, for example, an external file may be used to specify the values for various physics parameters of a vehicle, or the behavioral parameters of an AI opponent. Deciding on values for these parameters can be a difficult task, since a change made to one parameter value might affect several aspects of behavior. Although it is sometimes possible to calculate the ideal values mathematically, in most cases a trial and error approach needs to be used.

In this article, I will discuss how we have used the Particle Swarm Optimization (PSO) algorithm to arrive at good values for both our physics and AI parameters. The first part of this article provides a quick overview of function optimization. This is followed by a discussion of the PSO algorithm, after which I will show how we have used PSO within our racing game.

Function Optimization

Algorithms such as PSO and Genetic Algorithms (GA) are generally used on functions that take several input parameters and return a single output value. The algorithm aims to find the input values that will generate the maximum (or minimum) output value, also referred to as the global optimum, within the valid range of input values. The number of parameters is referred to as the dimensionality of the function and the range in which valid input values can be found is referred to as the search space.

For example, consider the following one-dimensional function:

f(x) = sin(x) + sin(x/2)

Figure 1 shows a plot of the output values for all input values in the range 0.0 to 12.6. From this graph it is clear that the global optimum is somewhere around x = 11, at one of the low turning points of the function. At about x = 5.2 there is another low turning point. This is commonly referred to as a local optimum: an optimum within a sub-range of the global range.

Figure 1: The graph of the function, sin(x) + sin(x/2), shows clearly the global and local minima.

Craig Reynolds's Flocking Model

Many of us are familiar with the name Craig Reynolds, whose work became the basis for many simulated swarms in games. Reynolds was intrigued by the behavior of swarms, such as a flock of birds or school of fish, which seem to move in a synchronized manner without any central control. He studied their behavior and came up with a model of their movement, from which he created a graphical simulation of bird-like objects flying around (he called them “boids,” from bird-oids). His model consisted of four steering behaviors: separation to make the boids avoid colliding with one another, alignment to make a boid steer in the common direction of his neighbors, cohesion to make a boid steer towards the common position of his neighbors, and avoidance to make a boid steer away from other obstacles [Rabin 2002].

Particle Swarm Optimization

In 1995, James Kennedy and Russell Eberhart applied Craig Reynolds's model to the problem of finding optima in a search space, which can be compared to a flock of birds looking for a food source, and created the PSO algorithm. They kept the alignment and cohesion behaviors of the Reynolds model, but did away with the behaviors of separation and avoidance.

In PSO, a collection of particles (called a “swarm”) move around in search space looking for the best solution to an optimization problem. All particles have their own velocity that drives the direction they move in. This velocity is affected by both the position of the particle with the best fitness and each particle's own best fitness. Fitness refers too how well a particle performs: in a flock of birds this might be how close a bird is to a food source, in an optimization algorithm this refers to the proximity of the particle to an optima.

Each particle's location is given by the parameters of the given optimization problem, and a particle moves around in search space by adapting and changing these parameter values. At each time step, the particle's fitness is measured by observing the parameter values (location) of the particle. A particle keeps track of the best position it has reached so far (called the personal best position), and is also aware of the position of the overall best particle at a certain time step (called the globally best position). At each time step the particle tries to adapt its velocity (i.e. speed and direction) to move closer to both the globally best position and the personal best position, in order to try and improve its fitness.

Figure 2: This screenshot shows how particles “swarm” towards the global optimum, as indicated by the cross.

The PSO Algorithm

The standard PSO algorithm can be summarized as follows [Engelbrecht 2002]:

1. Set d equal to the dimension of the fitness function.

2. Create n particles, p0 to pn, each with a position vector, x, of dimension d.

3. Assign random position values to each particle

4. For each particle, pi:

1. Evaluate fitness (pass position vector values into fitness function and assign return value as fitness)

2. If fitness better than personal best fitness, then assign current position as personal best position (xpbesti)

3. If fitness better than global best fitness, then assign current position as global best position (xgbest)

5. For each particle, pi:

1. Update velocity, vi:

vi (t) = w v i (t-1) + l (xpbesti - xi (t)) + g (xgbest - xi (t))

1.

1. Update position, xi:

xi (t) = xi (t-1) + vi (t)

1. If criteria met, end simulation, else repeat from step 4.

The criteria for ending the simulation is usually based on whether the global best fitness is sufficient, or whether the simulation has run for the maximum amount of epochs (iterations). It is also important to note that in step 3, the personal best fitness of each particle, and the global best fitness need to be initialized to very poor values. At this stage, each particle's velocity is also usually set to zero, although they may be initialized to random values.

The number of particles, n, inertia weight, w, and the local and global component variables, l and g, are all system parameters of the PSO algorithm. These are discussed next.

System Parameters

There is no hard and fast rule as to how many particles should be used to solve a specific problem. A large number of particles allows the algorithm to explore the search space faster, however, the fitness function needs to be evaluated for each particle, so the number of particles will have a huge impact on the speed at which the simulation will run. Generally speaking, as the complexity of the search space increases, so should the number of particles.

The inertia weight, w, in the velocity vector update equation, is a scaling variable that controls the influence of the previous velocity when calculating the new velocity. Inertia weight values larger than one will typically cause the particle to accelerate and explore larger regions of the search space, while smaller values will cause the particle to gradually slow down and do a finer search of a region [Van den Bergh 2002]. Many algorithms tend to decrease the inertia weight over time, allowing particles to initially roam a larger area in search of optima, and then to gradually do finer searches [Engelbrecht 2002].

An early addition to the basic PSO algorithm was to place an upper limit on the velocity of a particle to prevent particles from moving too rapidly through search space. Clerc and Kennedy later proved that multiplying the velocity vector with a so-called constriction coefficient made velocity clamping unnecessary [Van den Bergh 2002]. The constriction coefficient is a factor of the local and global component variables, for which the sum of the two has to be larger than four for the rule to apply.

The local and global component variables, l and g, control the influence of the personal best and global best positions respectively. They are defined as l = r1 c1 and g = r2 c2, where r1 and r2 are random values between 0 and 1. c1 and c2 are acceleration constants that are usually set to values close to 1, and it has been shown that the following equation must be satisfied to ensure convergence:

w > ½ (c1 + c2) – 1

### Neighborhoods

The standard algorithm as described above is also called the global best (gbest) algorithm, because the globally best particle is followed by all the particles. The local best (lbest) algorithm, however, follows more closely the original model created by Reynolds. It differs from the gbest algorithm in that the whole swarm is divided into a number of neighborhoods, where each particle is only aware of the particles in its neighborhood. There is no global best particle, but rather a local best particle for each neighborhoods. Each particle is therefore only influenced by its neighbors, and not the whole swarm. The gbest algorithm can be seen as the lbest algorithm with one neighborhood consisting of all the particles.

The size of neighborhoods plays an important role in the lbest algorithm. It has been shown that smaller neighborhoods result in slower convergence, but generally lead to better results, since a larger part of the search space is explored and particles are less likely to be trapped in local optima [Engelbrecht 2002].

A few adaptations to the lbest algorithm have been suggested [Van den Bergh 2002]. Suganthan based his neighborhood assignment on physical proximity rather than the index of a particle. Particles that are close enough to one another are said to be in the same neighborhood, and this is continually evaluated so that neighborhoods change over time. Typically, smaller neighborhoods are formed initially, with fewer and larger neighborhoods forming as the algorithm progresses. This allows particles to search larger areas initially and helps to avoid premature convergence (i.e. a particle getting stuck in a local optima instead of a global optima). Kennedy proposed a social stereotyping method where particles follow centroids of neighborhoods rather than individual particle positions. This originated from the idea that people in a group would rather follow the collected beliefs of the group than the beliefs of one individual in the group.

Many other variations and improvements have been made to the basic PSO algorithm over the last decade, but these are beyond the scope of this article. More information can be found in the references listed at the end of the article.

Training AI Racing Parameters

Like most other racing games, our game consisted of a player competing against several computer opponents racing around a track. Since the AI opponents drove the same vehicle as the player, their competitiveness was based purely on the decisions they made and the way they handled their vehicles. The familiar technique of a racing line was used to portray to the AI what the layout of the track is and what a generally good line would be to drive along. Using the racing line as a guide, the AI still had to make decisions on when and how much to brake for a corner, how much to correct for over- or under-steering, etc. These decisions were driven by parameter values as specified in an external data file for each AI. To arrive at good values for these parameters, we used the PSO algorithm to train them.

Figure 3 : The racing line is shown here floating above the track.

The PSO algorithm was used to train 10 different parameter values simultaneously, some of which were mentioned above. The simulation was set up with 20 particles, each represented by an AI with randomized parameter values. All the AI drove the same vehicle, and inter-object collisions were disabled so that AI cannot affect one another. For each epoch of the simulation, the AI started at the exact same position and raced around the track and the fitness of an AI was the total time taken to complete a lap. Since some AI drove so terribly that they never completed a lap, a limit was set on the amount of time an AI could take to complete a lap. The simulation was set to run for a maximum of 100 epochs.

Because it was quite a high-dimensional search space, the algorithm would sometimes settle in local minima and yield poor results even after the maximum epochs have elapsed. Other times, however, good results were obtained within the first few epochs of the simulation. We ran quite a few simulations and chose the best results out of these.

Driving around the track myself I achieved a lap time of about 70 seconds. In one of the PSO simulations, the initial best time (fitness) achieved was somewhere around 80 seconds, with the average lap time of all the particles resulting in a very poor 115 seconds. After 100 epochs however, the best lap time was about 63 seconds, and the average lap time round about 65 seconds. Using the parameters of the best particle and putting them back into the game, I found it near impossible to beat the AI.

Typically, in a racing game, it is desirable to have AI opponents of various difficulties. Since each particle represents a set of parameter values, particles with varying fitness can be chosen and used within the game to achieve this.

Since the fitness evaluation of the simulation depended on AI completing a whole lap, simulations were quite lengthy. To speed the simulation up, rendering and frame rate clamping was disabled to try and get the simulation to run as fast as possible without negatively affecting physics and AI computations.

Training Physics Parameters

Once we had trained the AI parameters for our game using the PSO algorithm, we realized that it would also be a useful technique to train the physics parameters for our vehicles. Specifically, we needed to obtain spring values such that the car's wheels would both settle at the right height, and the car's physics would remain stable.

The PSO algorithm was used to train the spring values of each of the vehicle's four wheels. The simulation was set up with 16 particles, each represented by a vehicle with randomized parameter values. Once again inter-object collisions were disabled. For each epoch of the simulation, all the vehicles were moved to the same position a few meters above the ground and then dropped. After a few seconds, the fitness of each of the particles was measured based on how close the car's wheels were to their ideal positions and how quickly the car had settled down.

Generally, the physics training led to good results and the trained physics parameters provided a car with stable physics that could then be driven around and allow us to do minor manual tweaks.

Conclusion

The Particle Swarm Optimization algorithm has been in existence for the past decade, but is still relatively unknown in the gaming community. It is an easy algorithm to implement and can be used to solve the same problems that Genetic Algorithms are generally used for. We used the PSO algorithm to train both physics and AI parameters for our racing game and found that it gave good results that needed little or no amount of manual tweaking.

If you would like to find out more about the PSO algorithm, you can visit the Computational Intelligence Research Group's website.

References

[Engelbrecht 2002] Andries P. Engelbrecht, Computational Intelligence: An Introduction, Wiley, 2002.

[Rabin 2002] Steve Rabin, AI Game Programming Wisdom, Charles River Media, 2002.

[Van den Bergh 2002] Frans van den Bergh, An Analysis of Particle Swarm Optimizers, PhD Thesis, Department of Computer Science, University of Pretoria, 2002.

_____________________________________________________