tl;dr Write down your goal, collect as much data as possible to measure if you’re reaching that, and make that data queryable. Then, do some basic counting and group-bys while asking questions about correlations of game events to your goal metric – and you’ll be headed in the right direction. Data science is as much about articulating your goal clearly, determining how to measure it, capturing data, and getting it prepared (80% of the work) as it is doing fancy analyses and machine learning (20%).
Hey, I’m Adam – a developer at Gyroscope. We’re currently working on a Unity plugin. At almost every conference or meetup I go to, there’s some talk about data science.
First things first: What is data science?
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.— Josh Wills (@josh_wills) May 3, 2012
Much has been written trying to define what data science is and what differentiates it from other roles and skills. For the sake of this post, I think it's best to consider it as a way of thinking: measuring outcomes focused on an objective/goal, being scientific, being experimental, and using data to drive decision-making. Along with this way of thinking comes a common set of practices (e.g., logging, A/B testing) and tools (e.g., hypothesis testing, machine learning).
Why use data science?
You can use data science to measure to what degree you’re reaching success and what you can change to increase that! The definition of success is up to you; it can range from increased monetization (for free-to-play) or increase playtime (for premium games).
Candy Crush by King Games
Candy Crush and League of Legends are great examples of free-to-play that use data science. In the premium space, Diablo 3 and Overwatch come to mind.
It can seem odd to consider games, often characterized as artistic endeavours, as something you can measure and tune -- more like a machine. Data science is no replacement to creativity and design. It is a complement to game design that can support those efforts and fill in gaps that design cannot. I think of it as a force multiplier: if the core of a game isn’t there, data science won’t help.
Who should use it?
Data science is applicable throughout the cycle of your game and is used by both indie developers and large studios. At release, you can use data science to understand how you’re acquiring users and where to focus. During game growth, you can determine what keeps players engaged, what various groups of players like and dislike about your game and tune it to match their preferences. This game cycle is relevant no matter what size you are or whether you’re a publisher or studio.
The most common way folks think about data science is that it is something free-to-play / monetization-focused games use. And that makes sense: Revenue is the measure of success. There’s even a slight stigma associated with such practices in the community. However, it is definitely applicable to premium games and, in my opinion, is not a dirty word :) Even premium games want their players to enjoy the game and keep playing (and possibly purchase DLC later).
The first, and most important, part of data science is to measure whether you're achieving your goal – often referred to as a Key Performance Indicator (KPI). To know what you want to measure, you must have a clear goal in mind. Next, you identify a way to measure the degree to which you’re reaching that goal. For example, your GOAL may be to have the most popular game of all time. One KPI may be Number of Installs. Another could be Number of Concurrent Players.
Bear in mind that the KPI is a measureable, best approximation to determining whether you’re reaching your goal. Careful thought must be put into them: Trying to maximize a bad KPI means you won’t be reaching the goal you intended to reach.
Below are a few common KPIs:
Cost of User Acquisition: The average cost to obtain an install for a single player. This combines all marketing/advertising spend and is often broken down by channel and acquisition strategy.
Lifetime Value: The average revenue for a single player over their lifetime. There are two formulations, empirical and predicted. In the empirical formulation, you look at historical data over a certain time interval since install (e.g., 6 months) to measure the average 6-month lifetime value. In the predicted formulation, you extrapolate from empirical LTV or use fancy ML to predict LTV *before* you actually get the revenue. I suggest holding off on predicted LTV until you have a good grasp of historical. It's easy to be misled.
Conversion Rate: The percentage of X who do Y (a generic measure). Common conversion rates are sign-ups-to-install and installs-to-in-app-purchase. They can also be thresholds: number-of-installs-to-playing-5-multiplayer-matches. Say, in the threshold example, you have a 25% conversion rate (which is great!). That means that 25% of the players who installed the game played 5 multiplayer matches.
Churn Rate: The number of users who have churned (i.e., left the game) X days after install. 7 day, 30 day, and 90 day churn are commonly used. You can define “have churned” in a variety of ways, but is often centered on number of days since last activity. For instance, you could consider a user churned if they haven’t opened the game in the last 7 days.
Active users: The number of users actives over the past X days (can be calculated every day, every month, etc). Here, X might be daily, weekly or monthly. You might also stratify your results by resurrected users (i.e., those that you marked as “churned” but late returned to your game).
Conversion Rate on DLC/Expansions: See Above “Conversion Rate” definition.
Playtime: The number of hours played (potentially divided by total time passed since install)
The funnel and beyond
Once you’ve decided on the top level KPIs, you’ll want to calculate them. In short, you must record as many points in the player journey (i.e., an Event) as possible from marketing to install and beyond. Below describes the areas to focus on and also mentions some tools to help you get there.
The first point in the funnel you want to focus on is acquisition. You’ll want to understand how your players installed the game and how much you spent to get that install. For instance, you may pay for Facebook ads that linked to your game. When an install happens that was initiated by a FB ad, you want to record that install event and tie it to the FB ad spend. In the premium world, you’ll compare those marketing costs to the revenue you make from the game purchase. In the free-to-play world, you’ll compare those costs to the player’s lifetime value. If you make less money than you spend … something needs to change.
To understand how your game itself is performing, you also should log gameplay events. These can include lifecycle events like game opens, game upgrades, playtime, game crashes, etc. There are also game specific events like item collection, level completion, enemy kills, levelling, experience gains, jumps, attacks, etc., that you’ll likely want to capture. The set of events strongly depends on the game, but you’ll get a feel for what is important to game progress and enjoyment.
More importantly, you should record conversion and monetization events. For a free-to-play game, these can include ad-presentation (and the result), in-app-purchase prompting, in-app-purchases (including the item and value), push notifications, and so on. It is essential to record when any monetary transaction occurs, or some other notable event like logging in or signing up.
Every game is unique, and so I leave it to you to pick the right tool for you. Here’s a few recommendations:
Surprise! Everything I just talked about up to now is 80% of data science -- that is, setting up your goal, getting data collected, cleaned, and setup to analyze. So, here’s the last 20%: analysis. There’s no single solution, tool, or technique for how you can improve your KPIs. Generally, you want to ask questions and make hypotheses in relation to your KPI. For instance, you may ask: “Do players who reach level 10 have a markedly higher LTV?” “Which acquisition channel is most likely to buy DLC?” Having been so involved in the game development, you’ll have a sense of these questions. Depending on the tools you choose, they’ll support queries to answer these questions or you’ll need to load the events into a database and do some old-fashioned SQL queries.
Often, the answers to these questions vary based on segments and temporal cohorts. For instance, user’s who arrived from a FB ad may be more likely to monetize. Therefore, you may increase monetization events to that segment of players. Alternatively, you may find that players who complete the tutorial seem to play longer. If you improve the tutorial, it may be easier to retain more players!
When you make changes based on what your analysis shows, you’ll want to monitor their effects across temporal cohorts. For example, you can group by each install week or by game release version and see how KPIs differ. A portion of that variance may be due to changes you made. If you want to truly test whether your change affected a KPI, you’ll want to use A/B testing. In A/B testing, you choose a random subset of the players to see a change (let’s call that subset A) and the remaining players (let’s call that subset B) do not. You then compare the KPI of interest between the two groups (you should probably use a t-test [see below]). It is important that you only change one thing and assign groups randomly -- that allows you to establish causality. Knowing causality allows you to act more aggressively, tactically, and confidently as you make further changes.
As you become more sophisticated, and the low-hanging fruit start to disappear, you might want to consider machine learning. It will allow you to predict a host of things about your players that you can act on. For example, you might predict user churn and, knowing that user might churn, offer them rewards to stay. Or, you might predict what level of difficulty the user will enjoy most on the next level, and so you can set it as such. The options are endless: Let’s get the fundamentals down for now and dig into that post next time.
I’m Adam Fletcher, CEO at Gyroscope, a company I recently co-founded with Jonathan Mortensen. We’re lifelong gamers and game developers. (Current favorites include the X-COM series and Rocket League.) With the rise of ML and AI, we saw an opportunity to bring that technology to games. Gyroscope allows you to create an adaptive, personalized game experience for players via a state-of-the-art A.I. director. Seamlessly tailor gameplay and monetization mechanics to each user – maximizing LTV while minimizing experimentation and instrumentation – with a Unity plugin that takes 5 min to install.
We’re entering Beta soon; you can sign up at getgyroscope.com]. We’re also happy to give some direction as you begin your journey into data science. (PM us or comment here.)
t-test: There’s a lot of subtlety to doing statistical tests. I could write articles on that alone. If you want more info, ask a question in the thread or PM me and I can guide you through your first one.