Featured Blog | This community-written post highlights the best of what the game industry has to offer. Read more like it on the Game Developer Blogs or learn how to Submit Your Own Blog Post
Big Data, Big Problems: A Mathematician’s Take on the Current State of Game Analytics
After performing an extensive analysis of the games industry's use of data science, mathematician and middleware developer Tom Matcham gives some insight into what studios can do to improve their game analytics.
90% Of The Way There?
A phrase I come across quite frequently with regards to Game Analytics is that ‘the simple stuff will get you 90% of the way there’. Whenever I hear or read this phrase, my immediate thought is ‘do you personally know that? Have you personally gone as far as modern research in machine learning and statistics can take you and concluded that the additional insight into your data provided by such tools was only worth 10% of your resulting report, and that bar charts and histograms and heatmaps told you 90% of what your company’s stakeholders needed to know about your dataset? Or are you presuming this to be true from what you’ve read from other people? Furthermore, what do you mean by “The Simple Stuff?”’
I’m not claiming that this statement isn’t true in many cases: I’ve definitely seen games where there really wouldn’t be much point in performing a logistic regression for example, but when the games industry has some of the best data sources available, I’m regularly surprised at the contrast between how game companies treat data compared to other industries. What I’m trying to say is that from my experience of talking with developers and producers, Applied Game Analytics, as a whole, is not done well.
Now, I completely understand that there are many constraints on the quality of an analytics investigation: time and money are extremely scarce, but developers still want insight into data. The problem is that a bad report is a very dangerous thing. Sample bias, misuse of data mining tools, misinterpretation of results and many other factors can lead to report conclusions that actually harm the game design and production process. It’s very rarely anyone’s fault, it’s just that doing ‘proper’ data science is difficult, and it’s incredibly important to have the right balance of statistics and computer science whilst performing an analysis.
Recurring Problems
Having investigated Game Analytics quite extensively, I’ve found 4 recurring areas that are frequently overlooked when game data is analysed:
Data Cleaning
Gamers are highly variable creatures and regularly act in bizarre ways. As such, it’s likely that if you don’t clean your data to remove outliers, the statistics of the gamers that you’re actually trying to design for will become distorted. Make sure you think very carefully about who and what you’re interested in: getting good quality data to analyse your particular problem is paramount to a meaningful conclusion.
What Probability Distribution is Appropriate for my Data
Far too frequently are statistics in game analytics computed on the basis that the data is normally distributed. If your data isn’t normally distributed, and you perform statistical tests based on this assumption, you’re going to get duff results that could lead to bad design decisions and ultimately a flop game. Think carefully about the assumptions you’re making about your data: can these assumptions be tested?
Over-reliance on Data Visualisation
It’s understandable to want to visualise data, especially when game development is such a visual process. Furthermore, data visualisation is an absolutely key component of the analytics process. However, if all the reporting you’re performing can be boiled down to a pretty picture, then it’s likely that you’re missing out on a lot of potential insights into your dataset. In statistics, box plots, histograms and the like are all part of Exploratory Data Analysis, which is usually performed by a statistician to get a ‘feel’ of the data they’re working with before they do the proper work. It’s quite likely that your dataset contains more insight than is purely representable by graphs.
Behavioural modelling
Note: my opinion on this subject is biased as I have a personal interest in behavioural modelling.
In game analytics academia, it’s frequently stated that it’s very difficult to infer the motivations of a user. Whilst this is true in many contexts, if you’re willing and able to creating a model of the player’s behaviour, it’s likely that you can do a fairly decent job of understanding why certain events took place in a playthrough. Obviously having such motivational data would be extremely beneficial for designers and moneymen alike, yet it’s a largely unexplored area in game analytics in both academia and production. If a developer had the resources, modelling and analysing the behaviour of a player would go a long way in explaining other game behaviours.
Closing Remarks
Although this article may give you the impression that I'm not impressed with the use of game analytics in practice, that's not the case. By contrast, I believe that the systems that many companies have set up to collect and analyse data are state of the art. However, I do believe that more can be done to get the most of the datasets that studios collect. Understanding what our users want is the essence of the game development problem: better analytics across the industry will make solving that problem a little bit easier.
Read more about:
Featured BlogsAbout the Author
You May Also Like