Opinion: How To Improve Build Stability

In this reprinted <a href="http://altdevblogaday.com/">#altdevblogaday</a>-opinion piece, Yager Development's lead programmer Andre Dittrich shares the practices he's picked over the years in making his studio's builds as stable as possible.

Andre Dittrich, Blogger

September 1, 2011

10 Min Read

[In this reprinted #altdevblogaday-opinion piece, Yager Development's lead programmer Andre Dittrich shares the practices he's picked over the years in making his studio's builds as stable as possible.] Build stability is always an important topic for us, but once a game production has entered the production phase in earnest, the stability of the game and the tools becomes one of the more important aspects for the tech team. The simple reason for this is that the number of people relying on this is highest at that point and any time these people have to wait for a bugfix or missing tools potentially means a lot of money wasted. So, keeping your build as stable as possible is important. And now for the bad news: I do not have the "This Solves All Our Problems" recipe. I want to share some of the measures we have applied in our projects. If you have other measures you have taken to ensure build stability, please tell me. I am always interested in doing more. Iteration Time Rules Having a stable build is very important – yes, but you cannot ruin the iteration time for your team. There will always be that level designer that requests a small feature, a small change or simply needs a critical bug fix really fast (usually yesterday) to finish the mission for the next milestone. You do not want him to wait for a week for that change. With 10, 20 or even more engineers working on your code base at the same time, the chance is high that there is always at least one that has added a bug that makes it impossible to release the next engine version to the team at least if you do not take some measures that help to keep the build stable. The problem of course is that the measures you take cannot add so much overhead that they become a reason for slow iteration times. So, everything you do needs to strike a balance between overhead and improved build stability. Automated Build Systems CIS – continuous integration server: you need this! It is bad enough if "real" bugs trouble your build – it is far worse if simple bugs destroy it. Ever come into the office in the morning to find out that you cannot compile the game? A typo, a file that had not been checked in, a bad merge? How many people lost how much time during this one morning? This is totally avoidable. The main function of our CIS is to continuously build the engine whenever somebody checks in a change. This makes sure that the engine and tools at least compile. Of course, we also run a few easy and fast smoke tests that also make sure that you can at least start the engine. But you can do even more. During the day, the focus is on getting the engine build as fast as possible and running smoke tests. During the night, we can do a lot more. We run automated tests to get statistics for memory usage and performance in test levels and game levels. These statistics are made available as graphs on an internal website. These graphs are an enormous help to recognize and track down sudden jumps in either performance or memory as well as gradual development. Together with good check in comments (see below), you can prevent this from breaking the game before it actually becomes a problem, or you at least recognize the problem very fast and efficient (without TAs or programmers spending time to find out why MissionXY is not running any more). When I talk about automated tests, I guess I have to talk about unit tests as well. I have some experience with it, though I have to admit that most of it is about how not to do it. We integrated a unit testing framework into the Unreal Engine on the Unreal Script and Kismet level pretty early in the production process. We started to use it for the AI code mostly as this was mainly written by us and not relying too much on middleware code (except pathfinding). The main mistake we made was that we ended up with actually doing integration tests and maintaining those takes a lot of time. For some time, we even made it part of the process to have "unit tests" for every feature we did. At some point we started spending more and more time on fixing the tests which were failing because of changes in other systems and not because of bugs in the tested code – we stopped doing it. For our next projects, I want to do actual unit tests to test critical parts of our code. Integration tests are something that should be used for finished features that are not very likely to change a lot, and I guess that means you have to keep that for a later time in the production. If you have experience with successfully applying either, I would like to hear about your experiences. Peer Reviews This is one of the best tools in our belt to improve build stability. It not only gives you a substantial improvement in build stability, it also fosters communication within the team and distributes knowledge (win – win – win). The idea is pretty simple: Whenever someone wants to check in a change, he needs to get this change reviewed by one of his colleagues. Of course, this will only work if it is taken serious. The goal of a review should be that the reviewer has a good understanding of what the change is, how and why it was done. There are no dumb questions during a review. If you do not understand something while you do a review, ask. This goes especially towards seniors or leads that sometimes might feel they should not ask dumb questions. If you think you even need someone else's opinion get it. You may and should criticize style and details. Ask for additional or improved comments if you think they might help. This is not only about making sure the change works; it is about sharing ideas and knowledge as well. So, what do you get in the end? Reviews will easily spot obvious issues or problems with the idea of how to solve the issue at hand. They will rarely spot really intricate bugs or side effects. By that, it will remove quite a number of bugs that would have been found later by the automated systems, by the QA, or even worse, by somebody trying to use a broken tool. What you also will get is people learning from each other, people looking into parts of the system they would not see usually. At least two people know the change that has been made in detail, so people getting sick or leaving the company becomes less of an issue. You get a culture of talking about your work and making sure work is actually done before the check-in (it is pretty embarrassing if obvious flaws are discovered by your reviewer in a piece of code that you actually considered worth checking in). People in your team talk, they develop a common language, they understand weaknesses and strengths of the team members. A few things to keep in mind to make peer reviews work:

it costs time – make sure everybody knows that this is time well spent and factor it into your estimates
every checkin is reviewed – a lot of mistakes are made with "easy" or "small" checkins
people should be available for a review – nothing is as annoying as not being able to check in just because nobody has time, therefore you should have a damn good reason to refuse a review
add the information about who did the review to the check-in comment – reviews will be taken a lot more serious that way, and if you hunt a bug caused by a check-in, you know the two guys you should talk to to help you

Check-in Comments It might not be very clear initially how check-in comments can improve build stability because once the bug is checked in, it is in. Good check-in comments make it a lot easier to track down an issue. Applying a structured format makes it even easier. Just imagine sitting in front of the screen scanning through a list of 100 check-in comments to find out which change could cause your AI getting stuck while trying to vault over a cover. The easier it is to read the information and the better the information is in it, the faster you will be. We fixed quite a lot of our "hard" bugs that way. But actually check-in comments (if they are well done) have even more uses. You can subscribe to your source control system (we use perforce) to get an automated mail for every check-in in areas you are interested in and stay up to date with what is checked in by whom. This is not only a useful tool for a lead; it is also interesting for other programmers, QA, or producers to know what actually is checked in. Test Builds This is something that is not easy to do and it requires substantial in-house QA resources and some additional tool support. The basic idea is again pretty simple: Before you check in a change that you are not so sure about – test it. I guess everybody knows this bad feeling when he is changing something in a very old part of the code, and this code also touches a lot of other code (maybe the guy who originally wrote it is not even there anymore, or you have to change code in your middleware). You are just not sure about the side effects and yes there is no automated testing around that part of the code. Basically the only way to find out what your change does besides what you intend it to do, is testing it. The best people you have for testing are QA people (some of our QA guys find the strangest bugs and more importantly reliable repros for really hard ones – amazing). So, the idea is to create a local build of the game (or representative part of it), and send that to the QA team to test your change. While you are waiting, you can shelf your change and continue with something else. To make this a viable option, you need really great tools to make the whole process as easy as possible. We are using the Unreal Engine with their build tools. It is easy to create a local build of the game for any platform using the Unreal Frontend. This tool is used to cook the game for the platform you need it for. Out of this tool, we can push a build on to a central server (the prop server). The QA can get this build by using a simple web frontend and have it copied to their PC or Xbox. Yes, you could cook a build, copy it into some network folder, and write a mail to the QA on where to find it. But the easier the whole process is, the more likely it is that people are actually using it and do not find excuses to not do it or get frustrated because they have to. We also established a bit of a strict workflow around it to make the whole process even smoother. Even applying all of the things I explained above perfectly will not give you zero bugs, but it will allow you to spend your time on the important and interesting bugs and what is even more important – adding cool shit to your game. To not lengthen this lengthy article further, I kept the individual parts pretty short. If you are interested in the details of how we exactly do certain things – let me know. I could make one of my next articles cover this in more detail. [This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]

About the Author

Andre Dittrich

Blogger

See more from Andre Dittrich

Related Topics

Related Topics

Recent in More

Related Topics

Opinion: How To Improve Build Stability

About the Author

Latest News

Trending

Featured Blogs

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">Opinion: How To Improve Build Stability</span>Opinion: How To Improve Build Stability

About the Author

Latest News

Trending

Featured Blogs

Opinion: How To Improve Build Stability