[Splash Damage's Dean Calver explains why your coders shouldn't settle for office or gaming rigs, arguing for top-of-the-line and scarily expensive systems, in this #altdevblogaday-reprinted opinion piece.]
Whilst true of all disciplines I expect, I’ll talk about this one purely from a coder perspective, mostly because as a code lead, over the years the efficiency of the team has been something I’ve spent a lot of time thinking about.
There are many aspect of team efficiency, but this article is going to focus on the easiest to implement. Which is simply “Open your wallets and buy the best, hell no, buy better than the best”. This makes sense in big budget AAA projects, where often spending the money is the easiest, fastest, and most available why of improving efficiency now.
It's not necessarily the best solution, but it's one that can be implemented with almost no downtime and no other resource costs except cash (and IT infrastructure). A cash solution is rare in large projects where communication, acquiring staff, deadlines, and morale are all significant issues affecting efficiency but also much harder to solve.
Often saving a bit of money on hardware is seen to make sense for the project (a penny saved, etc.), but it's a false economy in my humble opinion. In fact I’d go as far as to say, the best hardware is more important than the millions you're likely to end up spending on AAA marketing, because marketing has to have something to sell, and the easier way you can make a better product is to make your most precious asset (your team) happier and more productive.
Some, of course, will say, with a decent hierarchical, incremental build system, building your game shouldn’t require crazy hardware as you shouldn’t be compiling many files at once for a typical small update, to which I say… true, and also aerial porcine encounters do happen (they do, honest!).
It's not that you can’t solve it in software, its just you won’t. It's hard to maintain, it's expensive man-power wise, and that cost goes up the more productive the team is, and then it only helps if you really aren’t changing a major underlying system. The reality is, the spider web of files will mean that most people will be compiling a significant number of files at once, a fair portion of their working day.
So, optimize and pay for the worse case (which is the only case I’ve ever encountered), your build complexity will grow exponentially as the projects evolves. Now, don’t get me wrong, there is much to be done and should be done in software to reduce build times, but it doesn’t reduce the fact that good hardware == faster in almost all cases. And faster == more stuff done and happier team members.
So, I’ve convinced you to throw money at the problem, awesome! So, then what should you buy?
This is where I’m likely to scare the managers reading this who until now have been smug thinking, “We do that! We are awesome”. Outside my job, I also (somewhere along the line, not sure where) got involved with serious hardware, the stuff you don’t buy, you build (or pay someone else to build), the stuff enterprises pay tens or hundreds of thousands of pounds and dollar for software support contracts per year.
Now i’m not suggesting you spend that on software contracts, because to be honest your programmers will likely love fiddling with a decent system, but I do think you need to start thinking in that league in regards to hardware. Start with a rough £10,000 or $15,000 budget per programmer per couple of years on hardware only, and your approaching what I call "good".
So What Should I Get For That Kind Of Cash?!
I suspect I’ve not got many left this far down this article, but here comes the more precise side of things. Ignoring build distribution for the moment, building any large games consists of three main bottlenecks:
- Memory – how much stuff I can fit in
- Cores/HW threads – how many things can I build at the same time
- Disk – how fast can I get stuff into memory so my cores can work on it
Memory = normal usage + ~1 GiB per core + RAM Disk for temp files + cache for all source code, tools, the game, etc. at the same time!
So let's justify that:
- Normal usage is running IDE, web browser, email client, etc. Currently I find 8GiB is about right, as it's usually enough to run your PC build/tools/Debugger whilst actually building.
- 1 GiB per core. My current rule of thumb is that to compile happily, the compiler likes about 1 GiB of workspace, particularly true with large projects and templates (like boost. Some compilers are much more frugal than others, but others can be crazy wasteful. 1 GiB per core gives you the headroom for the bad boys of the compiler world. You want 1 GiB per core, because we are going to make sure that every bit of the platform is ready. So that no time is wasted, every core is going to used at the same time.
- RAM Disk. Compilers are chains of operations, that write out lots of temporary or rebuild-able files. You know that, the compiler knows that, but most filesystems don’t have a way of saying (except via tmpfs type systems, which essentially we are creating manually) what we are writing, but if I lose it all, it's not really that important. No matter what disk architecture you go for, minimizing writes that don’t need to be persisted will maximise its performance. A RAM Disk makes this easy. You direct all your .lib, .obj, .pdb, .exe, etc. files that can be rebuilt to your temporary RAM Disk. This means 99 percent of your writes are now free of disk bottleneck and running at the fastest speed possible. When you reboot your system or it crashes, worst case is you have to rebuild them all, however most RAM disk have to option to writes permanent storage out on shutdown, so except for crashes it appears as a normal very fast drive.
- Cache. The best place to read your files is from your disk cache. 99 percent of your source files won’t change each compile, so they can sit in your system's file cache lovely. But you need enough to fit all the files your compile references and all the other things you might be running in between. A common dev cycle is compile, debug, compile, debug, ad finitum. Which means you want enough cache for not only the development files but also any non-streamed files on disk (depending on platform, files to the console might go through system cache or not, but best to assume that will go through cache), the debugger, and other things that will fill you cache during the build/debug cycle.
The key takeaway point is that if you don’t have enough memory to support your cores/hardware threads, you're wasting your CPU's time, and if you don’t have enough memory for cache/RAM Disk, you are forcing your disks to over-extend themselves, also slowing your development process down.
Core / HW Threads
Cores / HW Threads = number of files that can be build + 2
Practically you are going to be limited by how many cores and threads you can buy in a single system. The +2 is to maintain system responsiveness even under heavy load. In practice, it doesn’t matter as we generally have more files than cores.
Cores and hardware threads actually do your compiling and building, so the more you have, the more in theory you can get done at once. So, as long as you have the disk and memory to feed them well enough, you want the most you can buy. This is where you leave the common world of desktops. It's time to take a walk over to server aisle of the shop.
The reason is 2P or not 2P (4P!). No, not a misquote of Shakespeare but the P stands for processors, because whilst you might get a single processor with up 12 HW threads in x86 land, that's just not enough. Given the characteristics of iterative builds, as long as we have the RAMm we scale fairly horizontally (number of cores) well. So the more processor sockets, the faster stuff gets done.
A current 2P Xeon system will get you 32 HW threads, and a 4P AMD system will give you 48 HW threads. That's a lot of processing power over a bog standard "workstation", and it really shows in building and compiling. It may seem expensive, but it makes development much more pleasant and efficient, if a build takes too long, coders will lose their place “in the zone”. The faster they get back to the problem, the better.
The other point in favor of server hardware is reliability. As a general rule, servers system run for months and months without an issue. That often can’t be said for "extreme" desktop platforms.
There are several issues,
- It's not representative of PC gamer rigs. If that's a problem, simply add another PC gamer box to you dev's desk.
- Noise. Servers are designed to run in data centers, where ear protectors are standard equipment in many cases. It's not ideal for your quiet office. There are two solutions: keep all the machines in a quiet room and use keyboard/mouse/video/audio/usb extenders to each desk, or buy special quiet server cases to set at the coder's desk. Of course there are always fashionable ear protectors as well…
Disk = (low write speeds (yay for RAM DISK) + high read speed to fill the disk cache ) + fast boot/load + fast source control update (fast small file reads and writes)
- If you’ve set up the RAM sizes as indicated above, normal building won’t actually need the disk much, as it will be largely in RAM.
- We all like things to boot and load fast, and also for those first or full build situations we do want that to be relatively fast.
- Source control system can be nasty for filesystems, often checking and updating gigabytes spreading across hundreds of thousands of files. Lots of IOPS and out of order r/w are good.
Two solutions here, one for each programmer and one for a small group.
- Each programmer is simple: buy a few PCI-E SSD, and RAID10 them together. This gives you your boot and local disk, fast enough to fill your RAM cache at break neck speed. It shouldn’t need to be too big, as most things will live in the next item.
- Amongst a small group of developers, connect a SAN via 10GB ethernet or infiniband to each machine. A SAN is a machine on the network whose only job is to provide very fast, very safe storage. Each SAN has its own complete system of caches and software to allow out of order delayed writes (safely), so that the main limit is network throughput, hence the 10GiB networks. Also, using RAID replication and ERC technology means data can survive disk failures and other problems. There are stupidly expensive enterprise SANs which cost ouch amounts of cash, however luckily that’s all smoke and mirrors to a large degree. Using a system like OpenIndiana, with some SSDs, a dozen spinning rusts (AKA traditional hard drives) and perhaps a DDRDrive accelerator, you can moderate amounts, have speed and iops to spare, all sitting in a nice little server connected directly to a bunch of programmers machines via a fast network.
CFO and bank managers will hate me, but I stick by my belief that your staff, your team are the most important part of any project, and that means buying them the best equipment so that they spend their time using that talent and not watching busy cursors and build outputs scrolling slowly up the screen.
There is more to hardware than what's here, from displays to build machines, project progress displays, video conferences, etc. This has just been the focus on the very smallest but most personal part of a realistic hardware capital expense for making worlds.
Today, to buy a 64GiB RAM, 48 Core AMD workstation with multiple PCI-E SSDs per programmer is going to come as a shock, as we have got used to skimping on hardware, forcing the team to work with little better than gaming platforms. It's never been true in many other performance computing sectors, and we need to realize that's what we do, we make some of the most complex performance software in the world. Just be glad we don’t need arrays of super-computer machines to build our worlds, well yet…
Its worth noting this isn’t just a spend spend spend article (well it is but… :p ), I’ve spent years looking into this aspect of builds, and the basic approach here works for lower expense systems, too. Coder systems need balance. Lots of cores on their own won’t cut it. RAM and disks are equally important, so even lower down the cost scale, don’t just accept an office or gamer's rig. We have very different performance characteristics than normal machines.
[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]