Anyone who has played WoW can attest to the problem that the diversity and customization of avatar abilities can create a balancing nightmare. Players rant over gear, buffs, class and spec imbalances. Some may be right, some may be wrong. It's hard to separate real player ability from stat imbalances. And yet, so many games beyond MMOs have started offering customization for their online experience. Consider Call of Duty: World at War, where avatar perks can make you the envy of the other players. How does one keep that all balanced?
The tried and true method of balancing often used game objects stats (i.e. weapon, character, spell, ability, buff stats etc.) to build tables in a spreadsheet or database. The various possible combinations of the stats can then be calculated and graphed to show approximate lethality, lifespan, damage over time, damage at range etc. In theory the results can be compared side-by-side by the designers to see what stats could use some tuning.
There would still be some playtesting, but there would be enough objective data to make informed decisions. In fact, it's often because there are some real numbers to compare to, that bugs in the game code can be found. (I recall a particularly nasty bug where the player's individual difficulty setting from the campaign was impacting his weapon damage in multiplayer.)
When using game data to graph, there was always some hand-waving approximation going on for things like accuracy at different ranges, player skill level and defender posture. Designers would come up with some formulae to factor in probable game situations.
Yet nowadays the games are so complex and the abilities and player options so varied that there are way too many situations to account for. That's why it's becoming extremely important to collect data from game sessions, from actual game situations, to test for balance.
Getting the Session Data
As I can personally attest to, this request for session data often gets shelved as the production runs late, squeezing the post-alpha polishing phase. During this phase, programmers are more concerned about performance, memory footprint, stability and various other optimization tasks. You'll just have to convince them that your primary polishing concern is game balancing and you need the right tools for the job.
It need not be that hard or complex. Start with some log files that record deaths, weapon accuracy, kills and a periodic heartbeat of player run-time stats like health, ammo, equipment, etc.
Be sure to include situational data that will help you identify if certain locations, styles, perks, abilities i.e. player choices are too weak or too powerful. You don't typically graph this by player but by the specific weapon, ability or choice. Ideally you have enough data from a wide variety of players who have been encouraged to explore all the options so that player skill is factored out of the equation.
For example, it's not enough to record that player A killed player B with weapon X, but also at what range and other factors like buffs, player posture and location.
Ideally the game could store these results up on a server automatically without a user having to extract and send it. Just be sure it's sorted by game version and session. The easier it is to collect the data and the more playtesters you have that can supply it, the better your data is going to be.
Graphing and Mapping the Data
Once you have the log files, you can wrap a tool around it to graph the data. I've found that Excel and Access usually do a bang up job of both processing the log files (using VBA code to sniff for values or XML importing), displaying the data and graphing it. You can even post the results up on your project website or Wiki for all the playtesters and developers to see.
Because you are mainly concerned with meeting player expectations, you should always have a target in mind. The target is a goal value that you'll be comparing your game session data to. This goal need not be fair. You may want some players with higher levels or certain perks to get better results, but it's still important to have a target for how much better. Target values should be graphed against the real session data.
If you or a programmer has time, the data and tool could include location information which is then mapped onto a game level so you can see where as well as when. 2D topdowns usually suffice, but if you can get the data represented in 3D or in the game engine or editor, that's even better. This mapping also helps identify patterns of movement and which areas or pick-ups are being overlooked. You might see some undesirable clustering of activity or too many deaths in one area or too many kills from a specific perch. This is perfect objective feedback for level designers.Choosing what Data to Graph
But what data do you graph? When comparing a mage to a warrior or a sniper to a brawler, it may seem like apples to oranges, but there should always be some statistic they have in common - usually mapped and tracked on game results screens and leaderboards. If this a multiplayer game session, it's kills, deaths, accuracy, assists, heals, points scored, etc. A campaign might have many of the same stats, but that has less value for example if there is no way for a player to progress without killing all the NPCs.
A campaign should graph data collected over time and location by using a timer and region triggers to create log entries. These would show deaths (to judge difficulty at certain locations), ammo (to judge availabiliy of pick-ups) and hitpoints/armor (to judge both difficulty and availability of health pick-ups or similar opportunities to recover/ Armed with data like this, you can see very quickly both the various skill levels of your play testers and level difficulty.
More importantly, with enough data from a large number of testers, you can factor out skill and look at the specific situations to judge whether they defy the desired goal.
Don't Stop There - The Analysis Begins
The graphs will reveal imbalances. They will raise questions. It's important that the session data include specific information about the situation for the designers to investigate and identify the problem. Going and talking to the playtester should be a last resort - while valuable it reveals a problem with the lack of data and is not practical for large-scale playtesting and balancing.
Some of you may have access to data and tools like this. But if you haven't, you'll find that balancing is so much easier and less controversial with such objective data at your disposal. This may all seem like a lot of work to setup, but it will save time and create a better balanced game in the long run.
Sorry for the lengthy blog. It was too short for an article.
- Tim Ryan