Despite being conceived in the late 1950s, Neural Networks have only recently been taking the world by storm. We’ve seen them being used in everything from medicine to business and trading. We’ve even taught Neural Networks to play games such as Go, Chess, Pong, Breakout, even something as complex as StarCraft. You’d think that something so adaptable would’ve taken over the industry, especially as we’re already seeing these AIs go toe to toe with the greatest human players. There are a number of barriers preventing widespread adoption of Neural Networks in the commercial game industry, many of them I intend to overcome in this article. Over the past 6 weeks I’ve been evaluating a method for leveraging the power of Neural Networks in Video Game AI without sacrificing performance, adaptability, scalability or certainty. I’ll delve into this method a little later, but first let’s explore the issues of Neural Networks.
Current Methods and Their Issues
How do we use Neural Networks? Well currently the answer is quite simple. At one end you have your observations and at the other end you have your possible actions. What do these actions look like? For the most part, the industry seems to have settled on using the most atomic actions possible. Move Left/Right, Forward/Backwards, Jump, Shoot, Accelerate/Decelerate, I’m going to call this an Atomic Action Neural Network (AANN). In a way this is viable and in many cases we’ve seen it work. You’ve probably seen Neural Networks playing video games. From simple prototypes running on someone’s laptop to super computers that can play StarCraft, but here lies the first issue (In all fairness, the StarCraft AI was designed to use the same inputs and outputs as a human, whereas when we design bots for our games we’re permitted to cheat somewhat). We can easily train Neural Networks to tackle simple tasks without much of a hit to performance, however, once that task gets too complex, more and more compute power is needed. Every time the Network computes an atomic action, it’s going to have to compute another next frame (or at least in a few frames time). This is highly demanding. Further issues arise when you have to consider the wide range of situations an AI might find itself in. Would you be able to comfortably say that your AANN would thrive in any level layout with any number of opponents. When you consider all the variables that can change over the course of one game, training AANNs to handle all of that can seem a little infeasible without an array of god-like GPUs.
Both of these issues are compounded by the addition of multiple AIs in one level. It took life on planet earth millions of years to start working together (something some of us still struggle with), it’d be asking a lot of a AANN to start doing the same. Relying on an AANN to stumble upon the exact combination of Left, Right, Up, Down etc in order to exhibit some form of emergent, intelligent behaviour is only a little crazy (I say a little because some very clever people have only gone and done it), but try to do the same thing on an average computer except now we need multiple agents, and it becomes a very crazy idea.
The final issue I’d like to talk about is the issue of certainty, this comes in two flavours. The former is what I like to call the Neural Network Jitter, you probably intuitively know what I’m talking about. Ask an AANN to navigate somewhere and you’ll probably notice that they appear to jitter, even if all they needed to do was walk in a perfectly straight line. The latter issue follows on from the first, you can never quite be sure what a Neural Network is going to do. It could decide that the best thing to do is to walk around in circles, it could go in the complete opposite direction and become nigh unbeatable. Once you hit that “train” button, all bets are off.
So what’s the solution to all this. I hear one snarky fellow in the back whisper “don’t use Neural Networks,” problem solved, but in my opinion, I’ve seen them accomplish too much to simply leave them alone. The fact of the matter is we don’t need a Neural Network to figure out how to walk in a straight line, or figure out how to aim at a player. Scrap the AANN when it’s unnecessary. What I’m proposing is that we do some of the leg work and let the Network concentrate on what’s important.
Breaking Away From Real-time
So what is this technique, this method, this life saver (not quite) that I’m championing. In short, it’s planning and/or state control, and in long, it can be whatever you want it to be. One idea is to train your network to choose/control behaviours to execute instead of choosing actions. These can be for any purpose (within reason and must be scorable), I’ve decided to call it a Behaviour Control Network (BCN). Now simply combine your BCN with an algorithm that can execute your behaviour and vwala, you’re using Neural Networks for a fraction of the performance. In addition to this, if plans are generated by the BCN, they may be able to be reused, eliminating the need to run your Network again. BCNs can also be more versatile depending on your framework. Lets say your Network decides it’s going to execute behaviour a, followed by behaviour b, followed by behaviour g, the plan (a, b, g) can be viable in as many situations as you want it to. Now let’s put meaning to this plan, what if the plan (a, b, g) was the plan (get to cover, shoot, run away) in some sort of shooter, you can dictate how this plan is executed, ensuring that the agent finds a piece of cover, shoots, then runs away. Furthermore, if it turns out that it’s not possible to execute a behaviour, then it can be skipped. In this particular scenario you’ve dramatically increased the level of certainty, and with a sizable set of behaviours you may still be able to allow the Neural Network to create its own emergent strategies. This is one example of how something like this could be used, later on I’ll dive into the ways I’ve used it, but first let’s look at it from a systemic point of view.
A Layered Approach
It’s unlikely that Neural Networks have the capability to replace all of our AI systems (for now), so they shouldn’t be treated as such, rather as a subsystem interfacing with other subsystems. For now, let’s look at our example from earlier. The Network has just generated the plan (get to cover, shoot, run away). Our Network is oblivious to the locomotion system, the pathfinding system, the aiming system, the shooting system and the cover finding system. It’s simply selected behaviours that interface with those systems. You could even have a system above that dictates when the Network is used. Neural Networks work best when they’ve been trained to perform a specific task, perhaps the Network that generated our plan, was trained solely to deal with hostile threats. Perhaps an enemy character came into view, and our high level system decided to enable this particular Neural Network. Our Neural Network then decides to trigger the behaviour get to cover, which I’d imagine uses some sort of cover finding system, with a pathfinding and locomotion system. Asking a single Neural Network to replace all of those systems is unlikely to work without the compute power to back it up.
It’s up to you to decide which subsystems are handled by a Neural Network. Some systems may enable Networks, some Networks may enable systems, some Networks may even enable other Networks. Let’s say hypothetically you were required to build an AI for a game like Rocket League, one of the signature features of Rocket League is the ability to lift off from the ground and fly your car into the ball while it’s in mid-air (Aerialing), a task which requires many delicate maneuvers to pull off, the sort of task perfectly suited to an AANN. One approach could be to let a single AANN control the car at all times, but as I’ve discussed earlier, this isn’t wise. A layered approach would make far more sense from a performance oriented standpoint. Let’s say a high level system has decided to Aerial the ball, so this system enables a Network, an AANN even, specifically designed and trained to Aerial balls. As you can imagine, this Network will be more performant than a Network intended to be an “all purpose” network. In this scenario, we’re not even using a Neural Network to generate plans, we’re using some sort of planning system to enable a Network (how’s that for a plot twist). Regardless of which way you do it, the result is more performance and higher certainty.
What Did I Do?
Over this period of research I focused on training Neural Networks to generate plans and control states, essentially I was developing BCNs. I designed two wildly different projects with one simple goal, to evaluate the use of Neural Networks for real-time scenarios, without actually utilising them in real-time. The first experiment was a locomotion problem, simply using Neural Networks to control an agent that can jump from one ledge to another (using a double jump).
Planning a Double Jump
So I approached this problem in two ways. I started by approaching it in a way that just about everyone would approach it. Plug in some handy observations such as relative distance to the platforms, is grounded, has a second jump, etc, have the network produce the horizontal and jump inputs, and let it train. Textbook AANN. The agent didn’t perform very well, using solely reinforcement learning it wasn’t able to figure out how to traverse the gap. I believed there was a better way of doing it. Instead of having the network directly control my square, I used the network to generate a series of waypoints, indicating the points the agent had to jump. A basic locomotion script would then navigate the agent to those coordinates, jumping at the appropriate time. The agent did even better, while cutting down on CPU usage. In this instance using the BCN method has several additional benefits. No jitter as we’re using a simple algorithm to control the squares once the plans been made. Less data is being fed to the network, the plan can be made with or without the presence of the square, with the squares position not affecting the outcome of the plan, as a result only the relative position to the goal ledge from the start ledge is required. This system could be used for 2D navigation, particularly useful in procedurally generated levels, as plans can be generated at load time, real-time performance is barely impacted.
This is a simple application a BCN. For my next project I set my sights on creating an AI that could construct a plan from a set of behaviours, that could enable it to survive in a hostile world.
This project was far more complex, the AI would have to “gain an understanding” of the world it inhabits as well as how it can use its various behaviours to survive. The setting is a typical but simplified survival game. The player needs food, water and sleep to survive, but a lack of these commodities isn’t the only threat in this world. Monsters will come out at night, killing the player instantly upon detection, the player’s only hope is a fire they can construct at their base to ward off the monsters. In order to hunt they’ll need arrows, arrows need wood, a fire also requires wood. Water and Energy (replenished from sleep) can be found in abundance. To survive in an environment like this the AI will need to know when it can do what. Hunting, getting water and harvesting trees at night will almost certainly result in death. It’s essential that our AI lights a fire before the monsters come out (unless it gets lucky). It can’t hunt without arrows, it can’t make arrows without wood. The game was kept as simple as possible, allowing me to focus on the AI. Note that the complexity of the game isn’t too important in this case. For example, it’s more important that the AI knows it needs to get water, this doesn’t change no matter how many dimensions, mechanics or ways of getting water there are in the game, the how is handled by the GetWater state (which does change depending on the game).
Usually after about 6000 steps of training the AI would have a basic “understanding” of its world, it gets into the routine of gathering resources during the day, lighting fires in the evening and resting/crafting arrows during the night, only occasionally making some questionable decisions. As training continues, these questionable decisions are slowly ironed out, usually producing a nigh perfect routine of activities that allow it to sustain itself. I even decided to take away all of its resources mid-run, to see if it would freak out, but it didn’t. In the morning it filled its water, got some wood, then slept for the rest of the day to regain energy, before crafting some arrows and hunting the next day. It then just continued like normal. I imagine a system like this being good for wildlife in randomly/procedurally generated ecosystems. It could be cool to take a randomly generated galaxy, place trained AIs on each planet and just let fate determine the survivors. This leads nicely onto my next talking point.
The Stuff I Didn’t Get to Try
I was only given 6 weeks to research this topic, time was tight as you can imagine. So I thought I’d dedicate a section to the ideas I didn’t have time to pull off, in the hopes that some very capable fellows might read this article and give it a go. If you do pull off anything related to what I’ve discussed I’d love to hear about it, drop me an email (find it below) or post a comment and I’d be thrilled to take a look. I’m also planning on doing a part 2 next summer, and it would be great to feature the work of other developers.
1: Collaborative Neural Networks
I briefly touched on collaboration, and there is room in the survival project for some form of collaborative AI. Training a Network that could survive in a group could make for an interesting project. A more complex ecosystem may be required to encourage collaboration, you could even try to train agents that have potential to turn against each other by stealing and exploiting other agents. The higher level of abstraction allows for higher level AI.
2: Planning Ahead
Now for something that’s less of a pipe dream. The architecture I used for the survival project (observe current state, calculate behaviour to perform right now), was meant to be part of a larger architecture I didn’t manage to complete. The second part of this architecture was another Neural Network trained to predict what the state will be as a result of choosing any particular action. You could train this Network using backpropagation, allowing it to become pretty accurate with enough data (especially as you can feed it virtually infinite amounts of data). Using this prediction Network, you could then use the predicted state to inform the behaviour to be executed in the future. This process could then be repeated any number of times to generate an entire plan of any given length.
To make this process more versatile, you could train multiple prediction Networks that predict the state based on different outcomes of an action (i.e. if this action succeeds it’ll look like x, if it fails it’ll look like y, etc). Follow this with some sort of k-nearest neighbor algorithm and you could build a tree of behaviours that better account for changes in situation.
3: Behaviour Analysis Powered Backpropagation
Backpropagation is one of the most effective training techniques, however, leveraging its power in the method discussed could be problematic. I have a potential solution. A system (probably some sort of AI), that could analyse human player behaviour could then be used to provide data for a BCN. Provide the Behaviour Analysis System with enough labelled training data to recognise in-game behaviours, then you could use it to provide labelled training data for a BCN. If this were feasible, it could be an effective way of creating AIs that act like players.
As promising as I make this seem, Neural Networks do have their limitations. I’m not trying to give you all the answers, I’m simply showing you a window to an alternative. Whenever implementing Neural Networks, always remember that you don’t have full control over the Network, plan accordingly. As I’ve shown in my Survival Project, it is possible to train crucial behaviours, but there’s no guarantee that the Network can juggle multiple crucial behaviours. Sometimes it’s easier for you to step in and take the reigns. Beware when trying to build complex AI behaviours, training multiple Networks to handle different situations is often far easier than training a single Network to handle all of them.
Next summer I shall return with part 2 of this project, where I’ll be taking the ideas even further, implementing them into more complex video-game AI systems. If you make anything related to what I’ve discussed I’d love to hear about it. Feel free to leave a comment or get to me by email at [email protected] Constructive criticism is welcome. Thank you for reading.