Game Testing: Exploring the Test Space

In this article I will try to give a list of different types of test that I think should at least be considered before development of a new game starts. This will not be a complete, all-comprehensive list, but my hope is that it will be a good start.

Johan Hoberg, Blogger

August 21, 2014

12 Min Read

Introduction

So you are about to plan for how to test the game you are developing, or preferably – will start to develop. Where do you start? Maybe you have some understanding of what it is that you are going to develop. Requirements, or stakeholders that are telling you what should be in the game, and how it should work to some degree. But often this only covers a small portion of the complete test space. I use “test space” to denote the complete set of tests you would have to run to test absolutely everything.

You know that you will never cover the entire test space with tests, since this would not be very cost effective. But you probably have some idea that just testing the core functionality of the game works will not be enough. It will always be a priority question just which tests you will actually perform, but how do you think of all the tests that you would possibly need in the first place?

In this article I will try to give a list of different types of test that I think should at least be considered before development of a new game starts. I will use ISO 25010 [1] as a base, and add thoughts from James Bach [2] and James Whittaker [3] who have developed different approaches for exploratory testing, which will help to think about other ways to explore the test space. This will not be a complete, all-comprehensive list, but my hope is that it will be a good start.

ISO 25010

ISO 25010 is a quality model for systems and software. It contains eight categories of software quality, which can be good to use as a starting point when designing tests.

Functional Suitability

This category covers a large test space, and it is easy to completely miss large parts of it if you do not take them time to think through your approach.

The degree to which the product provides functions that meet stated and implied needs when the product is used under specified conditions [1]. A user action should lead to some kind of result. When I press the “New Game” button something should happen. When a press the mouse button in game it has some effect depending on where the cursor is located. When a user presses “Connect to Social Media Platform” the game should connect to said social medial platform. And so on. A large chunk of your tests will probably end up in this category, and these are most likely the first ones you think of.

Also included in this category, and equally obvious, would be artificial intelligence. Is the AI taking appropriate actions based on specified conditions? Same goes for audio – are the correct audio files played based on specified conditions? Verifying game mechanics are included here as well. Same thing with physics engines.

Another big thing, depending on the game under test, would be verifying that the game world meets stated or implied needs. Is everything in the right place? A rock, a vendor, monsters, NPCs, caves, castles, houses, lakes, mountains etc.

Multiplayer functionality is also part of functional suitability.

There are obviously many different types of tests in this category, and they vary wildly between different genres and games. Trying to create a complete overview map of all functional suitability tests for different genres is a whole article in itself. And even then a specific game could deviate from the genre and require other unique tests.

How to come up with tests for all these different areas is not an easy task. In a previous article I mentioned using Systematic Inventive Thinking as one way think creatively around what to test, by starting with a normal user behavior and applying SIT. [4]

Reliability

So apart from the functional suitability tests, another category of tests is reliability tests. The degree to which a system or component performs specified functions under specified conditions for a specified period of time[1].

Fault tolerance is one aspect. What happens when something goes wrong? Will the game crash, or will the game handle the fault in a suitable way?

Recovery is another. What happens when the game actually crashes? What happens with user data? Progression, items, scores, etc. Preferably the game should never crash, but if it does, it should have minimal impact on the user.

Duration and maturity tests is also included in the reliability category. What happens if you leave the game on for 24 hours? A week? Memory leaks could typically be found by this kind of tests. How does the game run after 12 hours of intensive playing? Any performance degrades? Any other degrades?

In this category I would also include random automated tests that click through the UI, and tests that drop automated bots into a game world and let them traverse the world randomly until they pass the walls of the game world – this could for example let you find places where characters models could get stuck in the world.

Stress testing is also part of reliability testing. In it’s simplest form it can include tests like pressing a button many times instead of one. It can also include thousands of users logging into a server at the same time, or hundreds of players fighting in PvP against each other, or participating in a public non-instanced quest together. Another example is a large amount of players performing microtransactions simultaneously. Under the reliability umbrella we are looking for if the game can handle these situations, and will not crash, log you out or something similar, not the performance under these conditions, which is covered by the Performance Efficiency category below.

Operability

The degree to which the product has attributes that enable it to be understood, learned, used and attractive to the user, when used under specified conditions [1]. So what does this actually mean?

Easy of use, learability, attractiveness. Is the game easy to start playing? Can the user understand the game mechanics? Is the user interface understandable? Is the tutorial good?

But this could also include fun factor testing when it comes to game. Is the game actually fun to play? Realism testing could also be included in this category, which includes the believability of the physics engine and many other things. Is the game realistic and immersive to the player? Balancing could also be included here as it can correlate to if the game is fun and understandable.

Performance Efficiency

The performance relative to the amount of resources used under stated conditions [1]. This can include response time, loading time, and different kinds of time behavior. It can also include different types of resource utilization, such as RAM, GPU AND CPU usage. FPS would be connected to that resource utilization.

This also includes performance tests under stressful conditions, such as relative FPS when fighting alongside 10 people in PvP versus 200 people.

Many of these tests also impact how attractive the game is to the user, which we talked about in the operability section.

Security

The degree of protection of information and data so that unauthorized persons or systems cannot read or modify them and authorized persons or systems are not denied access to them [1]. In a single player game without any microtransactions, security is important. But this importance explodes when testing an MMO, MOBA or similar game. Confidentiality, integrity, accountability, authenticity [1]. Security testing requires very specific competence compared to other types of testing, but is essential to modern games. How to battle botting and gold sellers can be included in this category.

Compatibility

The degree to which two or more systems or components can exchange information and/or perform their required functions while sharing the same hardware or software environment [1].

Interoperability is part of this. Support for different game pads, gaming mouses, keyboards, Oculus and Morpheus, and similar gaming paraphernalia. One software system – the game – and one or more hardware systems.

Co-existence between programs. Having Spotify running in the background. Running Team Speak, Ventrilo or other VoIP clients. Two or more different software system co-existing.

Maintainability

The degree of effectiveness and efficiency with which the product can be modified [1]. Many activities included in this category are development activities. But securing testability [5] is definitely something of interest to the game tester.

You could include testing for modification [6] support in this category.

Transferability

The degree to which a system or component can be effectively and efficiently transferred from one hardware, software or other operational or usage environment to another [1].

Here we can test on different hardware and OS. Mobile games need to be tested on iOS, Android and Kindle for example. Then there are different manufacturers of hardware. Different OS versions. Different versions of the hardware from the same manufacturer. For PC there is an unlimited number of configurations. Here the key would be to have enough business information to be able to prioritize different hardware and OS configurations, framed by minimum and recommended hardware requirements.

Heuristics by James Bach [7]

James Bach, one of the most famous software testing experts, has created a list of general test techniques that are simple and universal enough to be applicable in many different contexts, and can also be applied to game testing. This is another way of looking at the test space compared to the categories that I have previously presented. Attacking the problem from a different angle.

Function Testing - Test what it can do
Domain Testing - Look for any data processed by the product.
Stress Testing - Overwhelm the product
Flow Testing - Do one thing after another
Scenario Testing - Test to a compelling story
Claims Testing - Verify every claim
User Testing - Involve the users
Risk Testing - Imagine a problem, then look for it
Automatic Checking - Check a million different facts

To better understand this, look into the reference and read the document.

Testing Tours by James Whittaker [8]

James Whittaker is a professor and software testing evangelist at Microsoft, and has previously worked at Google as well. During his time on Google he developed a way of performing exploratory testing [9] which he called Testing Tours. Testing Tours can also be applied to explore the testing space, but takes yet another different approach in doing so than the two previous approaches.

“Suppose you are visiting a large city like London, England, for the very first time. It’s a big, busy, confusing place for new tourists, with lots of things to see and do. Indeed, even the richest, most time-unconstrained tourist would have a hard time seeing everything a city like London has to offer. The same can be said of well-equipped testers trying to explore complex software; all the funding in the world won’t guarantee completeness.” [8]

So James Whittaker created a number of “Testing Tours” that you could perform to explore the software under test, based on this tourism metaphor.

The Guidebook Tour – Follow the user manual’s advice just like the wary traveler, by never deviating from its lead
The Money Tour – Run through sales demos to make sure everything that is used for sales purposes works
The Landmark Tour – Choose a set of features, decide on an ordering for them, and then explore the application going from feature to feature until you’ve explored all of them in your list.
The Intellectual Tour – this tour takes on the approach of asking the software hard questions. How do we make the software work as hard as possible?
The FedEx Tour – During this tour, a tester must concentrate on this data. Try to identify inputs that are stored and “follow” them around the software.
The Garbage Collector’s Tour – This is like a methodical spot check. We can decide to spot check the interface where we go screen by screen, dialog by dialog (favoring, like the garbage collector, the shortest route), and not stopping to test in detail, but checking the obvious things.
The Bad-Neighborhood Tour – Software also has bad neighborhoods—those sections of the code populated by bugs. This tour is about running tests in those sections of the code.
The Museum Tour – During this tour, testers should identify older code and executable artifacts and ensure they receive a fair share of testing attention.
The Back Alley Tour – If your organization tracks feature usage, this tour will direct you to test the ones at the bottom of the list. If your organization tracks code coverage, this tour implores you to find ways to test the code yet to be covered.
The All-Nighter Tour – Exploratory testers on theAll-Nighter tour will keep their application running without closing it.
The Supermodel Tour – During the Supermodel tour, the focus is not on functionality or real interaction. It’s only on the interface.
The Couch Potato Tour – A Coach Potato tour means doing as little actual work as possible. This means accepting all default values (values prepopulated by the application), leaving input fields blank, filling in as little form data as possible, never clicking on an advertisement, paging through screens without clicking any buttons or entering any data, and so forth.
The Obsessive-Compulsive Tour – Perform the same action over and over. Repeat, redo, copy, paste, borrow, and then do all that some more.

To better understand this, look into the reference and read the document.

Conclusion

This has been my attempt to give a good base for continued exploration of the test space to help new testers or developers think of all the necessary tests they need to run for their game.

This is by no means the only way to tackle this problem, but I usually combine these three approaches when I think about a testing problem.

Unfortunately this is just the first step. The next step is to prioritize the different tests, since running them all is not feasible. But that is a topic for another article.

Johan Hoberg