There has been quite a lot of hype around big data (BG), with every vendor adding it to their marketing slogan and tag lines. Having been in data and consumer marketing for over 20 years, I was quite dismayed to watch the promise of big data following the Gartner Hype cycle... This series of blog posts hopefully cuts through and clarifies the issues surrounding big data, and more importantly applies them to the digital games industry.
In addition to the experiences of my company Sonamine (www.sonamine.com), I will lean heavily on an excellent book "Big Data: a revolution that will transform how we live, work and think". I won't be touching on the technology behind big data, I'll leave that to the vendors.
What does big data look like?
Rather than providing a definition, it is more useful to simply describe some of its characteristics.
- Large volumes - this is the most well understand aspect of BG. Log files of all kinds from all apps from all devices track all types of events. Game event log, in-game camera telemetry, location data all combine to create lots of records.
- Messy and incomplete - All the devices generating data can easily break down, errors creep in and not everything is 100% accurate, but is probably at least 50% accurate. Event tracking through 3rd party services could be lost due to internet blips and outages. Users can switch devices leading to orphaned data.
- Different unrelated sources - you can now combine web visit data with game telemetry, include demographic data from government and private sources, such as US census and Acxiom.
Leveraging big data for business value
In their book, Mayer-Schonberger and Cukier make the point that "at its core, big data is about predictions" (p.11). Examples given include the usual ones such as Google predicting which webpages are most relevant to your search and Amazon predicting which items you are most likely to buy. The data used for these predictions include
- all website information (large)
- your search history (incomplete and messy)
- your recent gmail messages (unrelated areas)
- all purchases on amazon (large)
- your location data (unrelated areas)
- and so on... ie. big data-ish
In Sonamine's experience with game developers, being able to predict which free user is ready to convert or which user is ready to make an additional purchase provides tremendous insight for the marketing team. Using the user predictions for marketing campaigns results in much higher conversion rates without spamming the entire user base. The data used for these predictions could include
- user game play telemetry (large)
- user browsing behavior on websites (messy, incomplete and large)
- location related attributes (unrelated)
- device attributes (messy, incomplete)
- and so on ... ie. big data-ish
Learning when not to ask why
When we use Google or browse the Amazon catalog, we don't really care why certain web pages are more relevant or why Amazon recommends certain products. But there is an interesting story recounted in the book...
Amazon first started by adding product reviews to drive sales, assuming that better reviews would drive more sales. They had a large editorial team writing these reviews. Then, Greg Linden came up with an alternative algorithmic approach, which was then rigorously compared with the product reviews. These A-B tests showed that the results were "not even close". In the end the algorithm worked so much better that it is driving up to 30% of Amazon's sales today.
It's important to recognize that the algorithm did not provide any explanation for why a user would buy a certain product. Amazon was one of the first companies to realize that although trying to come up with reasons "why" was interesting, and knowing "why" would be pleasant, it was unimportant for stimulating sales. Incidentally, they shut down the editorial group. If the Amazon team had insisted on explaining the reasons why a user bought a specific product, then they might not have implemented Greg Linden's approach, sacrificing 30% of their current revenue.
A consumer friendly example is in order. There was a company called Farecast, which took historical prices of flights and used it to predict whether prices would increase or decrease. If the flight price was predicted to drop, consumers could wait for a bit. The predictive models were incredibly useful and helped consumers save money. There are many reasons why airlines change their prices; none of those reasons was available and included in the predictive model. All the predictive model used were historical prices. Farecast was quickly purchased by Microsoft and integrated into the Bing flight search. Other travel sites such as Kayak now have similar predictive price capabilities.
Sonamine has worked with dozens of game developers predicting which users are ready to convert. In all of them, we encountered the "why" question. Designers and marketers wanted to know why specific users were ready to buy and not others. Unfortunately, the truth is that no one knows the true answers to this question, which might vary from user to user. The only thing we know is that the algorithms work much better than human intuitions; and that has been demonstrated in all our customers. Some customers were never able to overcome their fear of not knowing why, thus giving up the opportunity to get up to 20% more revenue from their players.
"Knowing what, not why, is good enough"
I am not saying that we should not ask why. Rather I'm saying that with big data, we don't have to know the answers to "why" before we act. Of course the decision to act on the predictions is dependent on the situation. The more important the impact of the decision, the more reason to act. For example, manholes have been exploding in New York City for many years, seemingly for no logical reason. A very good predictive model was developed that could identify the manholes most likely to explode. Officials could just take action and pre-emptively replace these manholes, preventing the majority of explosions. Knowing the reasons might allow us to prevent future explosions, and more research is certainly warranted. But in the meantime while the search for the real answer is ongoing, we can prevent the majority of explosions.
In the same way, when a game developer sees a major player drop off in level 10 of a game, they may try to change that level in different ways without knowing the "true" reason why users are dropping off.
In many cases, it would be impossible given our existing technology and constraints to answer the question of why. Let me draw from some Sonamine examples. When a user is ready to abandon a game, it is not possible to survey all of them when there are millions of players. Survey methods are notoriously subject to user biases such as recall availability.
In the same fashion, it is not really possible to answer the question why users are ready to make the first purchase. Sometimes, marketers or game designers argue that when a user gets to a high level or are "invested" in the game, they will buy. The problem is that there are always paying users who are at level 1 and others at high level that don't buy. Here's where big data starts to make explanations even harder and unrealistic, because more data points make the picture more complicated until it becomes totally in-comprehensible by a human.
In other words, when true explanations are impossible or impractical to obtain, use big-data algorithmic approaches to guide your actions. In fact, most of us frequently act on the basis of incomplete information. The only difference is that we are more confident and trusting of our own decisions, which are usually based on our "experience" and "expertise". Whether human experience is better or worse than big data algorithms is up in the air, but there are many documented cases where algorithms are far better. And we have seen it first hand at Sonamine.
When to act using big data predictions (even without the why)?
For the games industry, many decisions are rooted in creative and historical experiences. And this poses a unique challenge in adopting big data. It's true that no amount of big data can produce a new hit story line or new game genre. Data does not help us create, conceive and imagine new things; where big data can help is to improve things once the game is conceived, prototyped and ready to go. So if your responsibility is to improve and enhance a game that is already live, you should try letting big data help you. Here are some general situations in which big data would applicable:
- when there are lots of data points to make sense of, and you want to spot specific low frequency cases. Examples : (a) trying to figure out which credit transaction is fraudulent and will likely to result in a charge back. (b) trying to figure out which accounts are being used for gold farming and commercial activity.
- when you have a resource constraint, use big data predictions to target your resources effectively. Examples : (a) giving away physical merchandize to key users who are likely to churn. The limited resource is the expensive physical merchandize (b) targeting likely buyers without spamming everyone else. The limited resource here is player attention.
The next post in this series will cover the implications of letting the data tell you its story and the expert... I welcome your comments and feedback. You can reach me at nick_at_sonamine.com
Mayer-Schonberger, V and Cukier K. Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston, New York 2013 (Amazon link)