Threading - USE IT. How to stop wasting most available CPU power!

Almost every game published to date, from AAA titles to the smallest Indy bash-together is limited in performance in some way by CPU power. But what we mean by that is that they're all single threaded. This is a huge waste of CPU power - and potential.

James Hicks, Blogger

May 5, 2014

12 Min Read

On my machine, everything from Skyrim to Dwarf Fortress consumes what windows tells me is 11-13% of available CPU power. On a 4 core machine with hyper threading, this means in reality that somewhere around 25%, maybe a little less, of my CPU is actually used by each game, AND that everything is limited in its maximum performance - either framerate or actual game speed - by my CPU. Or rather, the 25% or so that developers are using.

Meanwhile, my GPU is almost always partially idle - its own maximum potential never reached because the games are stuck, constrained by a single core of processing power.

When this happens, my CPU is running at 4.3Ghz and is a reasonably modern Intel design. The problem is not my hardware.

It's the software. Part of the problem is game engines. They flag virtually everything as non thread safe, and nobody seems to be looking into expanding their engine to overcome this.

But the other half of the problem is games developers themselves refusing, for some reason, either to work around the limitations in their engines, or to use the other cores available to them in new and exciting ways.

I'm going to explain how we do both, in as close to plain english as I can. But first, I want to get something out of the way. A lot of folks might be tempted, at this point, to say "But James, we DO use other threads". No, you don't. If your game is still bottlenecking on one core and one core only, I don't care what tiny little jobs like sound processing or flibbit gibbling you've palmed off to another thread - you're still basically a single threaded application, sorry!

To the right of this picture is a gas giant that has a procedurally generated texture. Instead of bundling a few terabytes of textures that I can't afford artists to make into Ascent - The Space Game (http://www.thespacegame.com), we generate textures for every planet and moon in the game - hundreds of billions of them actually - and we do it all in other threads. As your ship gets closer to a planet, the thread kicks off again and again, making higher and higher resolution textures.

But there's a trick to this; Ascent is built on the Unity engine, which doesn't let you modify a texture in another thread. The trick is not rocket surgery - Unity lets you import an array of colours into a texture, and we just work on our array of colours in another thread.

This means that some of the time, Ascent will be using two, three, even all four cores at once if, for example, we approach a big planet with several moons. And it lets us do this without impacting framerates - in fact on most hardware our main thread is mostly idle and you either sit at max FPS, or you're GPU limited. You get a small impact when we load the array of colours into the texture back in the main thread, but I can't avoid that until Unity discovers threading.

I'll give you another example...

This is a picture of our terrain engine working away on a planet surface. Actually this picture's a little old; today's version of the terrain engine is smarter but you get the idea of what's going on. The player is in the bottom centre, almost at ground level, looking 'north' and we're looking at the scene from about 50 kilometers above. As is typical for a terrain engine, a lot of high speed mathematics is going on. But what's not typical is that absolutely none of it is impacting framerates (except where you run into GPU slowness because of all those triangles) - because we do all of our thinking in another thread.

Again, Unity doesn't let you change a mesh, such as our terrain sphere above, in another thread. It all has to be done in the main thread - but once again, we split the mesh into its component data structures (vertices, triangles, normals and tangents), do a whole bunch of stuff on these structures in our other thread, and periodically bring this data back and alter the mesh with it in the main thread.

There's two huge advantages to this - firstly, instead of only being able to do a bit of terrain-mangling each frame, for fear of clogging up the main thread and slowing down framerates, we can do, well, pretty much whatever we like, as fast as we like. As a result, the two main limitations on our terrain's detail are GPU power (too many triangles and the average GPU loses its lunch), and Unity's limit on triangles per mesh of 65k or so. We could work around that second one, but there's nothing to be done for the first but wait a few years for everybody to upgrade, and/or limit the engine based on available GPU power (which we do).

The second advantage, and the real beauty of threading in my view, is that it gives us virtually limitless CPU power. We can think about a lot of triangles there, and think a LOT about each triangle, and not worry about the game's performance so long as we don't make too many of them. As a result, we have a terrain engine that can render something the size of Earth, onto a sphere, without a height map as source data, because it has the available CPU to procedurally generate its own height map on the fly. Better yet, with all that extra CPU, this terrain engine isn't just thinking about the shape of the landforms, but continually balancing between the planet's base texture and the level of detail in the terrain - sometimes the texture is more detailed than the terrain and sometimes its the other way around. The engine has the CPU grunt to recognise this and communicate it to the GPU via shaders and their inputs. This is a big part of the recipe for how we can have hundreds of billions of crazily gargantuan planets in a game that's either 18 or 31 megabytes, depending on which platform you download it for.

And then there's some less conventional examples. One I keep hearing about is AI. Developers always seem to say their AI is limited by CPU power. Last time I checked though, it was 2014! Even mobile phones with less than two cores are now rare. The way I look at AI, there are two levels of it (three for an MMO but to simplify, I'll talk about two) - "live" AI which deals with actually moving, shooting, controlling the robot/person/tank/ship/aircraft/attack chicken/you name it, and then there's the "tactical" or "thinking" AI - you know, the part nobody codes because there's no CPU power available?

Well, imagine if there were whole cores sitting idle, just waiting to start thinking strategically about the whole situation facing your AI? And imagine if those cores could be tasked to think about each situation in great depth, without affecting framerates! Well, you're not imagining anything, that's exactly the situation!

Leaving server AI out for the time being, the way Ascent manages local client AI is exactly in this way. At any given moment, an NPC ship has a current task it's attempting to do, or rather, a task mode which is usually two things at once. It will always be moving, making itself a harder target, but its movements might be 100% evasive while its weapons or shields charge up or it flees for its life, or it could be attacking.

If it's attacking, which weapon is it trying to point at you right now? If evading, which side does it most want facing away from you? What direction does it want to be flying off in? When should it start turning back to shoot at you and where should it be aiming? At what point does it decide its lost the battle and flee for its life? Should it have the gravity anchor on for fine tuned control or to kill its sideways velocity, or off for maximum acceleration? Well, these questions are all tactical - they get ansered by a thinking thread.

Every quarter second or so, the thinking thread talks to the main thread and potentially changes its current activity mode, targets, the distance it wants to be from you, the weapons its using or not using, you name it. Then it goes back to analysing the situation in any level of depth and detail we might want. What this means for the future is that we can keep loading intelligence into our local AI and never worry about framerates. This means something totally different on the server, but that's not today's discussion.

How does this impact our AI? Well, again leaving the server side out for a moment, the fact is that in order to make Ascent's space battles even remotely playable for a human being, I have to severely hamper the AI's ships. That's right, instead of our AI cheating like everyone else's, I have to give the players all the advantages. The AI can move and turn at only a fraction of the speed you can - and in the higher end battles if you blink, you lose. I tried battling the AI on equal terms myself, and despite being the actual developer I found it completely impossible to beat. Its performance was literally perfect. It never misses, and it thinks and reacts to a changing situation before my brain even sees what's going on. As a result, fighting an NPC in Ascent is a lot more like fighting a human opponent in other games. It's more immersive, and more challenging - for all the right reasons.

So, to the nitty gritty - how hard is it to do Threading? Well, in my opinion, it's really easy if you keep it as simple as possible. It's harder than just doing everything in the main thread, but it's easier than trying to optimise a complex game after you've jammed everything into the main thread, so even in terms of net difficulty I'd say it's fairly neutral. One piece of advice I would give is to think outside the strict OO box when you're threading. Think about everything as data and instructions again for a moment, and it gets a lot more intuitive - probably because that's exactly how CPU cores think about everything. You're ripping off a layer of abstraction, basically, and that's good because messed up threading can be a real nightmare, and when you mess it up, you want that nightmare to be simple - and short.

The process I follow every time is as simple as:

Set up data structures for the main thread and the spawned thread to share. Include one bool which tells everybody who is messing with the data right now. When the spawned thread has done its thing, that bool is how the main thread knows its safe to open its christmas presents.
Create a thread and give it the code to use. Again, mostly functional code for the most part seems to make this easier and more intuitive, at least to me. I tend to use low priority for threads where I can, to prevent them from messing with the main thread and framerates, even if we get a lot of them going at once.
Set our bool to true
Start the thread
Main thread yields while that bool is set against it
Spawned thread does heavy lifting (yield sometimes too, 99% of the time this wont slow you down at all because there's so much unused CPU grunt available and you'll be back right away. I know there's a school of thought that you shouldn't need to yield if your priorities are set accurately, which is a lovely theory)
Spawned thread finishes and sets the bool to false, thereafter completing or yielding until the bool is back (depending how you want to control it)
In the very next frame, the main thread sees the bool is false, and takes its exciting newly processed data

In essence, that's it. The annoyances come during debugging - typically nice error codes don't make their way back from your spawned thread, but there are ways around that. The main thing to make sure of is never to cross the streams... one thread at a time is working on our shared data structures. There are thread-safe structures you can break this rule with if you're so inclined, but I've yet to find a need. Our AI code's main thread reads from variables the thinky thread writes to all the time, and that's about as close as I get to breaking these rules.

There are other ways to handle communications between the threads, but I've found this idiotically simple process to be the most foolproof and reliable. When threading, I like my bugs as simple as possible.

So how does this help Skyrim? I'm not sure because I have never attempted to profile what the game's using its main thread for. Presumably the terrain engine is part of it, so there's something. Presumably Skyrim's limited AI could also have been made more advanced if it had its own core (or a few).

Dwarf Fortress on the other hand could probably spawn ten threads, and spread the dwarves and other characters between them, dramatically speeding up its game framerates. Hmm now I feel like playing Dwarf Fortress, so it's probably time to wrap up.

In conclusion, I look forward to seeing my CPU go over 13% while playing someone ELSE's game in the future. May this day come soon.

About the Author(s)

James Hicks

Blogger

See more from James Hicks

Related Topics

Related Topics

Recent in More

Related Topics

Threading - USE IT. How to stop wasting most available CPU power!

About the Author(s)

Latest News

Trending

Cooking Games Spotlight: Deep Dives, Interviews, and More

Featured Blogs

Game Developer Essentials

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">Threading - USE IT. How to stop wasting most available CPU power!</span>Threading - USE IT. How to stop wasting most available CPU power!

About the Author(s)

Latest News

Trending

Cooking Games Spotlight: Deep Dives, Interviews, and More

Featured Blogs

Game Developer Essentials

Threading - USE IT. How to stop wasting most available CPU power!