Quality Quality Assurance: A Methodology for Wide-Spectrum Game Testing
Nintendo and Microsoft Game Studios veteran Wilson talks about the value of diverse, video game testing, suggesting a formula to make sure your game debuts with the smallest amount of bugs possible.
[In this Gamasutra article, Nintendo and Microsoft Game Studios veteran Wilson talks about the value of diverse video game testing, suggesting a formula to make sure that your game debuts with the fewest bugs possible.]
The need for wide-spectrum testing
In the process of software
development, there are constant pitfalls and perils to avoid, both to
application developers and game developers alike. Software testing, one of the
most resource-consuming stages of the development cycle, possesses more than
its fair share of these problems, as well as tending to have an unfortunate
stigma attached to it.
Some developers consider it
a short process to find glaring errors, a necessary evil or a secondary
concern. Consumers consider it to be the stage in which any problem they have
with the product should have been found (often rightly so). Testers themselves
often consider the testing stage to be rushed or insufficient once it ends.
The biggest misconception about software testing is that any one method of
testing is better than another. There is both an art and a science to software
testing, and neither of them should be ignored.
Testing a strict set of
conditions or performing seemingly random tests just aren't enough by
themselves, no matter how extensive the process becomes; both the art and the
science are needed to find as many of the bugs as possible, leaving the
software as functional and polished as possible.
The differences between ad-hoc testing and test cases
Ad-Hoc Testing
Ad-hoc testing, also known as free testing, is a style of testing that
definitely falls into the artistic side of the spectrum. This form of testing
is most often used in game testing, but it can also be found in consumer-use
panels and focus groups, where individuals are brought in to try new software
with very broad goals that they are directed to complete.
As a style, it is very fluid
and often seems random; a game tester may progress along half of a level as the
developer intended only to attempt to jump through a crack in the environment,
causing their character to leave the bounds of the environment and become
stuck, unable to return to the normal flow of play.
When testing a piece of
presentation software, a tester may attempt to loop through the entire
presentation rapidly multiple times, not allowing time for the images or videos
to load, which may stress the available memory to the point that the software
stops responding and locks up.
The Good
While these seem like random things to try during the testing process, they are
things that may occur in the real world. This is where ad-hoc testing becomes
an art: finding things that the end-user may attempt that the developers haven't
planned for.
This may seem simple enough, but the amount of creativity
necessary for this form of testing can sometimes seem staggering.
Entering a
dungeon in a game, letting it populate halfway, then deciding to go back to
town and save the game quickly may lock up the game -- but if a player forgets
to save their game before entering a dungeon, it could definitely happen.
Plans can certainly be made
to test these sorts of situations, but there's no effective way to plan for
them all. This is where ad-hoc testing is the most useful: testing situations
that may otherwise occur after release that weren't planned for during
development. The number of unusual actions good testers will try when left to
their own devices can be surprising, and they will often find a fair number of
problems that can then be corrected before release.
The Bad
While this sounds great, and
it often finds some major issues that would otherwise have been unnoticed,
there are a few problems with this method. It's almost impossible to cover all
of a piece of software's functionality in this way; there's often too much
space to cover to allow a testing team to perform free testing without any
focus.
Also, it is likely that more
testers will focus on areas that they have a preference for over others, which
will leave some areas of the software with less coverage. To diminish this
problem, many advocates for this style of testing temper their test plans by
assigning ad-hoc testers to specific portions of the software, but this still
isn't enough to compensate for the lack of disciplined testing.
Test Cases
This is the scientific side of testing. Developers and test leads will produce
a list of tests to be performed based on the functions in the product, which
functions interact with other functions, what different parameters there are
for each function, etc
Test cases are the counterpoint to ad-hoc testing;
where ad-hoc testing seems random, test cases are strict and disciplined. They
are used to go over a function, which can be as simple as moving between cells
in a spreadsheet or as complex as casting an intricate spell at a group of
enemies, from every point of view and with each command style that the writer
of the test cases can think of.
The Good
Test cases perform where
ad-hoc testing can often lack: they ensure that the most common actions that
will be performed are tested in a large variety of ways in every area of the
software.
This alone is a boon to the testing process, as the color palette that
an ad-hoc tester may take for granted in the software they're testing may have
an incorrect variable call in just one format style, resulting in the desired
blue becoming green. A test case to check the color implementation for each
color in each format would easily catch such problems.
The Bad
The amount of coverage a title receives through test cases is dependent upon
the people writing the cases. This coverage can be very extensive, especially
when the test cases are written by people with years of experience and an
in-depth knowledge of the functions that need to be tested, but nobody can account
for everything that an end-user may attempt.
There are just too many random
variables to be considered for test cases to cover every possible occurrence.
It's also important to note that some testers may be easily bored by such
strict testing protocols, which in rare cases could result in the test cases
not being completed properly.
Tools
of the trade
White-Box
Testing vs. Black-Box Testing
These two methods form the second axis of the testing grid. The amount of information
available to testers about the underlying workings of the software is largely
dependent upon which testing method they use.
In black-box testing, there is
little or no information about the internal workings of the software; all a
tester sees is what the end-user would see. Playing a beta build of a game on a
test console is a good example of this.
White-box testing, on the other hand, uses internal
functions or a second software suite (usually a debugger) to track what's going
on as the software runs, relaying information back to the tester or to a test
log as necessary.
Good examples of this testing method would be testing a
software build in a programming platform's development environment or using
automation software to test minor variances of an action repeatedly with a
debugger running in the background.
While it seems like everyone would want to use white-box testing, the fact that
an extra piece of software is running in the background and intercepting data
as it flows through the software being tested is an important consideration.
This interception can interfere with the normal working of the software, which
sometimes causes problems that normally wouldn't occur, or may even prevent
problems that normally would occur. As such, it's important to keep a balance
between white-box and black-box testing to ensure that the software in question
receives thorough testing.
Testers
These people specialize in breaking software. They each have their own
preferred ways of doing this, but most of them are competent with a variety of
testing methods. The amount of access they have to the resources necessary to perform
adequate testing, however, is often determined by their level of association
with a developer.
Third-Party Testers
The lowest rung, so to speak. These testers are often the backbone of the
testing process. Third-party testers are often contracted on a
project-to-project basis. They usually have limited access to special testing equipment,
though testers who show special talent with certain methods will frequently be
given access to extra tools. Third-party testers are often used for black-box
testing.
Hiring contract testers in large groups can often result in
a mixed bag, but even testers who have little or no technical knowledge can
usually perform both ad-hoc testing and test cases with positive results with
the right training.
Second-Party Testers
Testers who work in the testing group of a subsidiary or secondary company under
a larger company, second-party testers can be either contract or fully
employed. Due to their closer relationship to the developers, they often gain
access to more advanced tools. This often results in a stronger focus on test
cases and white-box testing. Most second-party testers are at least moderately
experienced in the testing process.
First-Party Testers
Testers that often work or communicate directly with the developers, first-party
testers are usually full-time employees of the company they work for, though
skilled contractors may occasionally be used in this capacity. They have the
most access to testing tools, and they will often manage groups of testers in
their tasks. Most first-party testers are also very familiar with the testing
process and various development cycles.
The
Tester-Developer Dynamic
This dynamic is why the levels of association described above are important. All too
often the developer and the testing group they work with will find themselves
at odds with each other on various issues, which often creates friction between
the teams.
The further away one group is from the other, the greater the
likelihood of this problem arising becomes.
Lack of tools or resources is often the greatest complaint
with third-party testers, but disagreements on how to handle bugs are very
common with all levels. Cost-effectiveness, time constraints, and feasibility
are all objects of contention when bugs are accepted or written off. It should
go without saying, but all parties involved should remember that they're all
working toward the same goal: a polished, functional product.
To this end, developers should understand that many testers
fall within the core demographic of their end-users, and sometimes the opinions
of testers would be well heeded.
Likewise, testers should understand that
developers have a greater understanding of where the project stands and will
make decisions based on information that the testing team may not have access
to. These situations can also be mitigated by sharing information between the
groups, as well as sharing useful tools with each other.
Managing
teams
The
Need for the Separation of Tasks
Nobody likes to have their toes stepped on. When you've assigned a task to someone, be
it a single person or a group, it's important to ensure that they have the
chance to finish that task.
The reasons for this are twofold: first, it's good
for the morale of the person or group that had the task assigned to them, as it
shows a level of trust in them; second, it helps to reduce the chance of
redundant effort.
Having more than one set of eyes on a test is always good,
but it's usually best to wait for all the other tests to be completed first.
The goal of separating tasks is to get a full battery of tests completed by the
combined efforts of the entire team.
To that end, the tasks need to be appropriately separated
and assigned to the teams that will be able to complete them most effectively.
For instance, if a game requires gameplay testing, a text check, and
certification testing, then there should be three teams assigned groups of the
tasks necessary for those tests.
Once a team completes the tasks they have been assigned,
then they should double-check the work already completed by other teams (with
the exception of certification testing, which often has to be performed by
people with special qualifications; those tests should be double-checked by
another member of the certification team). This is one of the best ways to
ensure that at least two people have seen each part of the software being
tested.
Team
Hierarchies and Communication
For every project, there should be one person in charge of coordinating the various
teams. Similarly, there should be one person who coordinates each team and
reports to the project leader. This forms a chain of command that can easily be
followed when problems within the various teams arise.
This is important for
the situations in which one team needs help from one of the other teams; with
this sort of chain of command, the leader from one team can talk to both the project
leader and the leader of the team they would like help from.
If the team in question isn't too busy, this isn't a
problem, and the team lead can agree to help. However, in those stressful
crunch times near the end of a project cycle, it's possible that time is too
tight to be able to help freely.
In situations like these, it will be up to the
project leader to determine if the need is great enough to require the help of
the other team or not.
This is why there also needs to be good communication
between the teams; when one team is ahead of schedule, it allows teams that
need extra help to know where to go to request help first. It also helps the
entire team know how they are progressing overall, which allows the team leads
and the project leader to be able to better manage the effort overall.
Weekly
(or even daily) reports can be very effective in this effort, but even a simple
verbal communication between leads on a regular basis can help immensely. A
little bit of communication can go a long way to help keep things on track and
running smoothly.
Matching
the Right Testers to the Right Teams
Needless to say, some people are more competent at certain tasks than others. An effort
should be made to match people to the skills they show the highest proficiency
in. It can take time to figure out where a person's talents lie, but the effort
is almost always worthwhile.
A person who is good at completing games should be
assigned to game play, specifically game completion if possible. In this way,
that person's talents will help to complete tasks related to checking the
overall playability as well as the endings of a game.
Likewise, if someone is good with language and grammar, they
should be used for checking the text in the game to make sure it reads as it
should. Someone who is talented at noticing problems with graphics should check
graphics and animations, and so on. In this way, a team can play to their
strengths, and the tasks assigned to them can be accomplished more efficiently.
Weaving it all together
So, now that the general concepts have been put forward, how
does it all come together? The first goal would be to set up an appropriate
distribution of testing techniques.
A good starting point would be 40% test
cases, 30% ad-hoc testing, and the final 30% should alternate until their
strengths are determined.
The easiest way to deal with the third group is to assign them any low-priority
test cases, as these will be the least detrimental if few members of the group
are suited to such tasks. In each of those groups, half should be assigned to
white-box testing and half should be assigned black-box testing.
Over the first week or two, it should be possible to determine which testers
are best suited to which tasks. After this has been determined, the testers
should be separated into the appropriate teams for the necessary testing tasks.
At this point, testing should be redistributed to about 60% test cases and 40%
ad-hoc testing, as test cases tend to be more time-consuming, which usually
translates to more manpower. Once the teams are formed, test plans should be written
for each team explaining how they should go about their tasks and at what pace.
It's important to note that the first week or two is recommended for determining
the strengths of unknown testers. If a tester is already known to be competent
at certain tasks, it's easy enough to start them in the appropriate team.
Otherwise, it's more valuable in the long run to figure out where a tester
would be best appropriated before giving them their final assignment.
Finally, a few words of advice specifically for developers. Documentation for
various mechanics and functions can always be helpful. The more information
that a testing team receives, the better they'll be able to test the software.
Also, if there's supposed to be any spoken text in a game, try to get the text
edited first. In this way, there will be fewer errors in the audio, and the
text will not have to be changed to match those errors.
Lastly, ask for the opinions
of your testers every once in a while; they may have some opinions that would
enhance your software in a way you hadn't considered before. We all want our
projects to succeed, and with a little teamwork, we can ship a title with as
few errors and as much polish as possible.