This article was originally posted in my blog.
I was inspired by it, so I took a stab at recreating and validating the numbers. I wanted to understand not just what the revenue estimate is, but how accurate of a guess it is: in other words, what would be a reasonable range for that revenue estimate.
Finally, I also wanted to overcome some of the limitations in Danny’s data:
- Age-restricted games are missing
- Release dates are incorrect for games that moved out of Early Access
- Some of the prices are incorrect due to a bug in his data collection (e.g. My Time at Portia or Indivisible)
Danny goes into detail describing his methodology. The idea is to use the review count of a game to estimate the number of owners, and from there again estimate revenue. The revenue trend is then used to predict future earnings so we can compare games independently of when they were released.
I captured the data the last week of 2019, using the Steam API. It’s a long process that takes several days due to rate limitations.
The first approximation is to go from reviews to owners. I used SteamSpy‘s data to try to validate the standard range from Jake Birkett’s post. Note that I didn’t use SteamSpy’s data anywhere else in this analysis, only to validate the reviews-to-owners transformation. Here are my results:
Including games with at least 10 reviews and at least 1000 owners, I analyzed over 7000 games and came to the following conclusions:
- 65 is a good number as a “best guess”
- There is a long tail where few reviews lead to many owners. In estimating revenue I chose to be pessimistic and ignore these cases, as a larger multiplier would result in more revenue (which is never a bad thing)
- A good range is 30 to 100, as that covers the thick area of the distribution, while keeping 65 as the mean. This range covers almost every game on the low end and, again, errors would underestimate revenue.
Once we have a number of owners, we need to translate that into revenue. It’s certainly not as simple as multiplying by the US price. Danny has a good formula that we’ll use as is. We multiply all of these values together:
- US price. Note that I used the price at the end of 2019 (excluding any discounts), so if a game has changed prices the results won’t be accurate.
- 0.7 (platform cut)
- 0.93 (VAT)
- 0.92 (returns)
- 0.8 (regional price). SteamDB is a great resource to see the regional prices of games and how they compare to the US price.
- 0.8 (average discount). In my tests the range varies from 0.7 to 0.9, depending on the age of the game and other factors.
Everything except for the US price and the platform cut are approximations. Still, combining these factors results in a multiplier of 0.38. That’s the money you get to take home.
The last piece is looking at revenue over time. This is useful in itself, as it can help predict future results based on the current sales of your game. I tried to predict the number of reviews a game would have in the future, by looking at the cumulative reviews over time. Looking at the first 5 years for games with over 1000 reviews, and after normalizing we get this chart:
This chart includes data from over 400 games. The red line is the mean and the blue line is the median, per day. The green line is the curve that Danny uses in his article, and it seems to cluster most of the revenue on the first year much more than you would expect from the data, so I feel it’s too conservative.
For my analysis I used historical review data to create a simple regression model to estimate future review numbers. It’s nothing fancy. Because in my tests some games followed a linear model while others adhered more to a logarithmic one, I calculated both and use them to generate a range, plus a “best guess” based on how closely the model follows the real data.
Finally, it’s hard to say much about what happens after 5 years as we simply don’t have enough data and, furthermore, the market will have changed by then, so current rules probably won’t apply.
The result of this estimation is a range for both year-1 and year-5 revenues (using either review actuals or predictions, depending on the game’s age) as well as a range for “current” revenue (as of Dec 2019) and “best guesses” for all of the above. The reason for providing ranges is to reinforce the idea that these are estimates: a lot of approximations are used so accuracy is not to be expected and the further into the future we’re predicting, the larger the range will be.
Steam is growing
It’s not a secret that the number of games in Steam has exploded in recent years. However, it seems we have seen the peak and 2019 even had a modest decline in the number of Indie releases:
Talks of the Indiepocalypse are typically backed by data showing a decline in median revenue is for games launched in recent years. This can be seen in the data:
I want to offer another perspective. Instead of looking at the % of games that succeed, let’s look at absolute numbers. After all, a lot more people are releasing games, so maybe a few more people are finding success:
And in fact, this is the case. Even if it’s true more people are failing, it’s also true more people are succeeding. This is encouraging.
Review score matters
Danny shows a correlation between review score and revenue. I was able to recreate that:
I used median instead of average, so I didn’t have to cap the revenue estimates.
Interestingly, my data doesn’t show the decline around 70% that you see in Danny’s data, so my belief is that it’s simply due to noise.
Intuitively, it makes a lot of sense that higher review scores result in higher sales when you consider typical player behavior. Many players will read reviews before deciding to make a purchase and a low score will likely deter them. Also, if the game is high quality and players enjoy it, they’re more likely to recommend it to their friends, and word of mouth is extremely important for succeeding in Steam.
Looking at the market as a whole can highlight some general trends but in order to get better insights we need to segment it. The best way to do it is using Steam’s Tags.
Chris Zukowski wrote a great article in his game marketing blog about how he uses this kind of data to do market research before working on a game. I believe this is the right approach: look at a space and use the data to understand it, as opposed to looking at the data and trying to extract a magical combination of tags that would make a game successful.
In order to facilitate this I created a number of interactive graphs so you can filter the tag(s) you’re interested in, select the date range and see it all visually. See them here.
While navigating a graph is useful, sometimes you need a way to filter the tags in a more sophisticated way. Leha Games created such a tool for Danny’s data, and I wrote a simple version of it for my data, that you can find here.
Show me the data