Using Steam Data To Understand Gamers

Over the past year, while completing a Masters degree in Entrepreneurship with a video games focus, I’ve been using data analysis to help guide decision making when it comes to starting new projects. This article outlines some of my methods and findings

If you ask Jeff Bezos what he thinks is at the heart of Amazon’s success compared to other internet retailers he’d probably say words to the effect of: Customer obsession. He has been saying that understanding customers is important for decades, which you can check if you look through old interviews of his.

This also fits the traditional startup mantra of focusing on “product-market-fit”. If you are to create a product or service that solves an issue for a particular group of people in an optimal way, first you might want to understand that group of people.

So how do you understand gamers? You could try asking people questions. You might have to ask a lot of people though which is what Quantic Foundry does. This can reveal some very interesting insights. However, I’d argue that surveys reveal self-perceptions of the respondents. They don’t always reveal behaviour. Humans are complicated animals and we often don’t understand ourselves or know why we do things. Our self-perceptions and self-reported behaviour can be different from our actual behaviour.

I think a more direct way to understand gamers is to observe what we do.

Over the past year, while completing a Masters degree in Entrepreneurship with a video games focus, I’ve been using data analysis to help guide decision making when it comes to starting new projects. I was inspired by Erik Johnson’s GDC talk and wanted to be able to dig deeper into the data that he had accessed. In years past I did what many creatives do and make games based on my personal preferences/history. Although I still do follow my passions I also now want to have an idea of how other groups of gamers are likely to receive any given idea, while it’s still on paper. My goal is to fuse art and business more effectively than I have in the past.

My Questions

  1. Are particular types of games more likely to succeed commercially than others?

  2. If yes, what are they?

  3. How much content will a gamer expect out of any one game in genre x if the price is z?

  4. For any particular subset of games, which of the possible categorisations is most important to the audience?

This article will focus on questions 1 and 2. With question 4 my game developer friend Laurie James helped out as his coding skill to manipulate the datasets we used is far greater than mine.


For publicly accessible data, there is only one platform that we can focus on and that’s Steam. They are way ahead of any other gaming platform for open data and provide APIs to access it. Thanks to Valve for providing this valuable resource. For this reason my analysis is on Steam gamers only. The results might be very different on other platforms but without accessible data we can only speculate.

Steam Spy

Thanks to Sergey Galyonkin for keeping the SteamSpy API running. Using Valve’s APIs, he has compiled some handy shortcuts for looking into each “tag” on Steam which is what I use for my analysis.

Engaged Users Drive the Trends

First we have to understand that, like many avenues of human activity, the “Pareto Principle” is in effect on Steam. The minority of the people produce the majority of the content and acquire the majority of the resources. A research paper from 2016 by Mark O'Neill, Justin Wu, Elham Vaziripour, and Daniel Zappala showed us that the top 20% of players account for 73% of the market value of Steam.

We also know that, like with many areas of digital engagement, around 1-2% of players will leave a review for a game they downloaded. Please note that I only have anecdotal evidence for this from a sample of game creators, rather than a large data set verifying it. That same figure of 1-2% is common for e.g. number of “likes” compared to views on Youtube videos or number of people who buy in-app purchases compared to total players of mobile freemium games. Of course, there will be outliers with a super engaged audience much higher than 2% but for my purposes I am not interested in the extreme hits. I want to view the market from the lens of “how does your average game of type x perform?”

By looking at how many reviews a game receives we can use this as a proxy for number of downloads simply by multiplying by 100 (for a 1% ratio of reviews to downloads) or 50 (for a 2% ratio of reviews to downloads). Reviews are our basic measure of engagement.

As we also know that only the most engaged users or “core players” will leave a review, we are getting an indication of what the core players care about simply by seeing which games receive reviews.


I'm using a simple methodology to answer questions 1 and 2. You can follow the same steps to run your own analysis:

  1. Identify a tag of interest on Steam e.g. “Text-Based”.

  2. Download the dataset for that tag using SteamSpy’s API and convert the json data to a format that can be added to a spreadsheet.

  3. In the spreadsheet, add an extra column for “Total Reviews” and have it equal the positive and negative reviews received for each entry.

  4. Filter out games that have a price of 0. For my purposes I’m only looking at premium games, not freemium games.

  5. Sort the remaining games by Total Reviews.

  6. Take the median value, the figure in the middle of the list. The mean average would be skewed by the big hits so is best avoided.

  7. Record that median value on a summary list. Also useful to record the total number of games that use the tag.

  8. Repeat steps 1 to 7 for other tags you are interested in and compile your summary list.



Please note that I’ve downloaded data sets at different times over a period of months, so some of the data used is not the most recent. However, I’m looking at overall trends rather than weekly/monthly trends.

Occassionaly a game might go free for a weekend which will cause a spike in downloads and reviews. However, the data sets used are big enough not to have to worry about these individual anomalies.

The summary of results generated gives an indication of which kind of games are more likely to engage people on Steam. The differences between tags are quite staggering. Taking just three example tags we can see that the median Action-RPG  gets 17 times more reviews than the median Text-Based game. The median Online Co-op game gets 79 times more reviews than the median Text-Based game. This is a huge difference.



Number of games using the tag

Number of reviews for the median game in the list







Online co-op




Of course it’s possible that you can make a stand-out text-based game and have a commercial hit. I can think of two recent ones myself. The purpose of this analysis is not to try and say what will definitely happen in the future or to say what is possible or not in absolute terms. What we can reasonably say is that, on average,  there is more likelihood that an Online Co-op game will receive more reviews (and more downloads) on Steam compared to a Text-Based game. You can then apply this thinking to other, less obvious tag comparisons. As previously mentioned my intention is to ignore the hits and focus on 'what typically happens'.

A public copy of my results is here, where I’ve looked at 73 tags so far.  It’s an interesting read (if you’re into data!) and I think you can infer some general guidelines about the relative numbers of people who are into different kinds of games on Steam.

We can also cross reference this set of results with the useful map of Steam tag relationships that Quantic Foundry produced. This will further help to understand the perceptions of the audience and the creative norms that past games have established. This will help you to decide where you want to fit in and how you want to stand out.

I used data analysis and other forms of analysis to help guide my last team to decide on what game direction to go in, combined with the passions of the team members and other creative factors. It worked out well and that game is now getting interest from a large number of publishers.

If people are interested in this topic and my findings I’ll write up the findings to questions 3 and 4 in a future article.

I don’t often use Twitter but I do have an account so you can message me @HoneyTribeStu (although it might take me a while to respond)


Shaz Yousaf

Latest Jobs


Playa Vista, Los Angeles, CA, USA
Senior Level Designer (Zombies)

PlayStation Studios Creative Arts

Petaling Jaya, Selangor, Malaysia
Lead Concept Artist

High Moon Studios

Carlsbad, CA, USA
Technical Designer at High Moon Studios

High Moon Studios

Carlsbad, CA, USA
VFX Artist
More Jobs   


Explore the
Advertise with
Follow us

Game Developer Job Board

Game Developer


Explore the

Game Developer Job Board

Browse open positions across the game industry or recruit new talent for your studio

Advertise with

Game Developer

Engage game professionals and drive sales using an array of Game Developer media solutions to meet your objectives.

Learn More
Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more