In the games industry we are seeing a steady and continuous increase in team size. Today, even a team of some ten programmers and thirty or more artists (level designers included) is hardly considered large. The artists, and especially the level designers, often need to check how their art works in-game to make sure it looks and plays as expected. This means that there are a large number of people dependent on having a stable version of the game in order to do their job efficiently.
With such numbers of people involved it seems obvious that lost time means lots of lost money. If broken code makes it into the hands of the art team, the man-hours can start ticking away rapidly.
This doesn't only result in lost money but also in stress for the programmers who will have to try to fix the problem quickly; sometimes the entire programming team will more or less stop progress too as everyone looks for the problem.
To further compound the problems there are also more far-reaching effects where the morale of the respective teams is affected. If, for example, the artists have a low confidence in the code stability (and justly so) then they will be more liable to blame problems on the programmers instead of checking if perhaps their art is to blame, say something like misnamed files etcetera (and of course when this is the case the programmers get huffy in turn).
On the other side of the wall, the programmers may become too paranoid to check code in since they don't want to break the game, and end up hogging files checked out, or ending up with nasty merge problems if multiple checkouts are allowed. In all these cases, productivity is lost.
The evil effects above are not dreams born of a paranoid brain, but rather examples from a real world project. In the particular case the protocol was for everything to be put under source control: code, art and also executables, the programmer were to check in compiled code whenever checking in source. The artists then get these binaries whenever they update from source control. On the surface this may sound like a reasonable scheme, everything is in source control, ok, fine, that ought to fix it, good.
Only it's not that simple. Since the programmers have to build the source before checking it in, they have to prevent anyone else from checking source in while they are building. This means they have to lock out all other programmers who then will have to queue for their turn to lock everything. Building and checking that it all works can take some time, having to wait an entire day just to check in some files has been reported. In the meantime, of course, the programmer waiting won't just sit idle; he'll work on something else. Thus when it is finally his turn to check in he might be in the middle of something that isn't ready to check in… and so on.
Meanwhile, the art department is subjected to whatever code was last checked in, and any mistake can spread in a matter of minutes.
Needless to say these problems usually are the worst at the most inconvenient time possible: near milestones.
Fortunately this is not a necessary evil, and a little bit of care and thought can take care of most of the bad effects.Insulation
By inserting a layer of soft foam between the programmers and the artists, the likelihood of anyone getting hurt can be greatly reduced. One type of foam that I suggest using is also known as a QA department. Many people reading this will hopefully nod and think ‘Aye, useful they are, them QA teams!', however even large studios developing well-known titles have missed this vital institution, or are not using it for this purpose (or in a structured way at all). The great thing is that if you get some people test your products on a permanent basis you'll have to figure out something for them to do, and here's one suggestion.
Carry out nightly builds and have QA check the build. If they do not proclaim it stable, then don't let the artists get their hands on it. It is then up to the respective leads to decide if they are going to try to patch the build or simply wait for the next nightly build, obviously the specific requirements of the day will decide.
An added benefit is that we have an established process that can also produce release candidates and builds for other external purposes.
|Integration of Quality Assurance into the production pipeline can lead to more stable builds|
The Process, step by step
We are assuming that both the art and the source code are under version control. It is not strictly necessary for the art to be, but it simplifies things if it is.
In this protocol QA receives a build (labeled with a version number) that contains executables for the required platforms as well as a snapshot of the art in suitable formats and packaging. At this time the art and code under version control should be labeled with the build number as well. The build is preferably done automatically over night, so that the QA team can get in early and check automatic test results and perform manual testing.
Next, QA decides whether the build is stable and if so they put a copy where the development team can get it. The release is then announced and the artist team should update as soon as possible (to avoid people having problems from using old executables).
The art team should continuously check their art into the version control system; they just need to make sure their art works with the executables from the latest build.
What of the programmers then? They should use source control and get updates of the latest source code and data. Using this scheme they do not get any insulation from the artist's check-ins. However, there are several reasons why a buffer in this direction is less important. First the programming team is usually smaller, thus word spreads faster if something is wrong and the other programmers can avoid the broken data. Second, they are also more skilled with version control tools, i.e. the first programmer who gets broken data can usually see which file(s) it is and warn people, or pin the file to an earlier version. Finally, programmers are usually the ones who built the error reporting systems that exist in the game, and will actually understand the error messages. All in all, this means a much lower impact on the programming team.Insulating the Engine Room
If there is a separate team developing an in-house engine (or equivalently an externally developed engine is used), then the process can be extended to require that the engine is subjected to QA before being accepted by the game programming team.
To do this each engine build must be numbered as well as the game build, the engine should define a compiler time flag equaling the number. All code needed to use the new engine must then be prepared by the game programmers prior to accepting the build but conditionally compiled out until the build number increases. In other words, the programmers who work in an area where the engine will be changed will have to work ahead of the last stable engine, but ensuring that all new code is compiled out in the current version, see example 1 below.
#ifdef ENGINE_VERSION > 1023
…new code depending on new engine…
…old code using old engine…
Then when the whole hog is submitted to QA they can test the game build using first the old engine, then they perform the same tests with the new version of the engine. Correctly implemented this procedure shows up bugs that were introduced in the new engine. Although there is always uncertainty at the very least it will be known there are problems and the programming team will avoid the engine until it clears QA.
Clearly this procedure can be extended for any library the game uses, although most libraries stay the same during the course of a project.Results
This simple procedure creates a much more stable working environment for the artist team. Now the artist can have a nice fixed morning procedure: check for build update – get coffee – start working.
At the same time it also makes life simpler for the programmers, as the pressure of breaking the build is lessened. This means that programmers can check code in frequently (without wasting time on superfluous, paranoid testing) and spend their time on what they are paid for. A misconception here is that removing this pressure should result in sloppier code, however the mistakes are still caught by the QA team. So the problem (if someone is frequently responsible) can be dealt with in a civilized way instead of by lynch mob. Peer pressure also accounts for keeping people wary of checking in broken code; since the rest of the programming team will usually let the person who broke the code know what they think.
Another benefit is that having a QA team means that the bug reporting procedure becomes more well-defined. The artists assign bugs to the QA department, who in turn verify the bug and assign it to the relevant programmer. This becomes the natural procedure since QA is responsible for clearing a build, they must naturally be the first (and last) stop on for any bug found in that build (this allows them to improve testing procedures and also filter out duplicates). Furthermore since we have introduced the concept of builds it becomes easy to record the version (aka build number) a bug appeared and will be fixed.
Now while we have introduced a stabilizing layer into the system, we have also added a delay, slowing down the turnaround time (note that we have not increased the development time but introduced some latency). However the cases where an artist really needs the bleeding latest code are relatively rare. In such cases they have to work closely with a programmer anyhow (who will make those specific changes needed) who can supply the artist with an intermediate build. For the most part all content will be both forwards and backwards compatible with the code and no problems occur, all the artist needs to know is that when a fresh build is made available from QA it is time to upgrade. Since we are dealing with large team sizes it should be obvious that if we introduce some (potential) slowdown for a relatively unusual case while making everyone else's lives simpler, it's an overall win.
In most production environments, it is also common to build the data into a big archive file. Creating this can also take considerable time. This problem is addressed since everyone gets one with the build each morning (or however often it is done), thus they don't have to create one for themselves. This archive can then usually be patched with updated content as the day goes on, and next morning it is back to a fresh copy. Most people will never have to build an entire archive themselves, and again it's a win in production time.Further Developments
The process can also be developed further, for instance by specifying more exactly how programmers and artists working ahead of the build should behave. Perhaps insulating the programmers from the artists is really worthwhile?
Since the programmers (usually) control the tools, they can actually create insulation for themselves by building a system that automatically checks the art assets for required properties. The detailed description of such a system is outside the scope for this article.Summary
In closing I want to point out that this is by no means the be-all end-all scheme of build iteration. It is a simple yet robust procedure and is presented in the hope that it can be of use to people. The process has successfully been applied to large projects (in which the author has participated) more or less in exactly the shape as presented.
It should be noted that a QA team need not be large, there is an immeasurable difference between zero and one person. On the other hand between one and two there is only a measly 100 percent. Correctly empowered with automatic testing tools a QA department can easily be a one man institution.
For smaller projects, the benefits are not as large, but the process still provides an increased level of control over the development environment and having a process in place makes it easier to scale up the team size later.