The last couple of weeks have been busy for the three of us who make up Whale Hammer Games, really busy. On August 31, we released our debut game, Tahira: Echoes of the Astral Empire, which we've been working on for the last three years (and a little change) to Steam and on GOG.com.
Reception on Release
We were very happy with the reception to the game when we launched. On GOG we've been sitting around 3.9/5 (google 'tahira gog' to see) since launch from 33 ratings. On Steam we were sitting comfortably around 81% positive and professional reviews were generally averaging out to around there too, with some reviewers going lower and others going higher.
The most pleasing thing for us though, was that our Kickstarter backers, the people who knew best what we'd been working on for the last three years, were overwhelmingly happy with the game that we had made for them.
Boom, just like that we're down to about 50% positive on Steam.
Now, let's get one thing clear before I continue - We understand why the change was made. The way the system was set up very clearly had the issue that it would be easy to trade keys for positive reviews. The new system doesn't fix this completely - It is still possible to game it by paying people to buy the game and leave positive reviews, but a financial disincentive is usually a very a good way of stopping the vast majority of people. Valve did exactly the same thing with Greenlight.
So that's all well and good, except that in fixing that problem, Valve have created a situation that does this:
Wow, yeah. This basically kills Kickstarter games. Now KS is just a way to find and silence your biggest fans.https://t.co/vZSgbLAmV3— Tyrus Peace (@TyrusPeace) September 13, 2016
Maybe that sounds a bit dramatic, but let me throw it over to our programmer Tom, who will take you through our review numbers and will suggest a solution to help fix Steam's review system.
Breaking Down the Numbers
Looking at the stats for our reviews, we can break them down into four categories:
Positive reviews from Steam purchases: 11
Negative reviews from Steam purchases: 8
Positive reviews from game keys (almost entirely Kickstarter backers): 23
Negative reviews from game keys: 0
(All data in this section was pulled from our Steam page on 20 September 2016)
Under the old Steam review system, this gives us a review ratio of 34/42 positive reviews, or 81%. Under the new system, our review score drops considerably, to 11/19 positive reviews (58%).
Before I continue, a couple of observations: first, we have more Kickstarter reviews than Steam purchaser reviews - this makes sense as we have fewer commercial sales at this point than we had Kickstarter backers. The second major point is that our Kickstarter reviews are unanimously positive - every Kickstarter backer who decided to review the game enjoyed it. Given this, I can see why Steam is making the argument that reviews from keys should not be considered - they appear to come from a biased, non-representative section of our players. I don't agree with this argument, but I can see why it's being made.
Steam also allows users to vote on whether they found a review helpful or not, and the average 'helpfulness' of our reviews is also interesting to look at:
Average helpfulness of positive reviews from Steam Purchases: 65%
Average helpfulness of negative reviews from Steam Purchases: 29%
Average helpfulness of positive reviews from backers: 77%
(note that I have excluded two reviews made by Kickstarter backers from this analysis, because they had no votes on their helpfulness)
Here's where you can most clearly see the problem with the changes Steam has made: by cutting out our backer reviews, they have silenced the group of reviews which users find most helpful (positive backer reviews), and amplified the value given to the reviews which users find least helpful (negative purchaser reviews).
Steam acknowledge that the new system is flawed in their press release, saying:
A Possible Solution?
Problem #3 is a strange one, which I'm not going to attempt to solve here, but I'd like to put forward a way to solve problems #1 and #2. In doing so I'll be exclusively talking about what goes into Steam's overall review score, as for any game without an extensive marketing budget (i.e. most games), this is the singular number which matters the most. I'll also be doing so in a way which would be easy to implement (no new data or tricky algorithms needed), resilient to sudden changes (no threshold variables or other factors which can suddenly make one or more reviews significantly more or less significant than they were previously), and resistant to meta-gaming (restrict the possible impact of one or several 'vigilante' reviewers).
The simplest solution, I believe, would be to weight each review based on how useful it's deemed by other users. I'm making one slight change to the existing system by assuming that the person who wrote the review finds it helpful (so a review which 4 out of 6 people found helpful would now have a score of 5 out of 7, by taking the reviewer themselves into account). This is done to give us a way to meaningfully weight new reviews, which is important, as I'll explain later. In this system, a positive review which 3 out of 4 people found useful would provide 0.75 points towards that game's positive score, and 0.75 points towards its total score. A negative review which 6 out of 10 people found helpful would provide 0 points towards a game's positive score, and 0.6 towards its total score. A visualization of how this would work for Tahira's steam user reviews is below:
In this table, "Score" refers to the number of helpful votes, as compared to the total number of votes. "Positive Score" is zero if the review does not recommend the game, and the same as "Score" otherwise.
This gives our game a weighted score of 7.47 (the sum of all values in the 'positive score' column) out of 10.08 (the sum of all values in the 'score' column), or 74%.
The single largest advantage of this system is that it neatly solves Steam's problem #1 - a game's unhelpful reviews have a significantly reduced impact on that game's score. It's also a very simple system, with no additional data required to implement it, and no convoluted maths required to make it work.
There is a valid criticism which could be levelled at this system, which is that reviews aren't weighted based on how many people have voted - i.e. a positive review with 3/4 helpful votes is weighted the same as a positive review with 300/400 helpful votes. As a corollary to this, a review is at its most impactful to a game's score immediately after it is posted, with 1/1 helpful votes. However, given that Steam appears to be trying to create a self-sustaining system where games receive meaningful scores based only on their user reviews, I would actually argue that this incentivizes the behaviour that Steam wants. As I mentioned above, it means that the most impactful thing a user can ever do for a game's score is to leave a review (encouraging more people to leave reviews), and once a review has been left, it means that the next most impactful thing a user can do is vote on a review which doesn't yet have many votes (making the system self-correcting, as users become more likely to vote on things with few votes, meaning reviews are more likely to accrue a meaningful number of votes over time).
This has the added bonus of softening the blow of the previous changes. Under the new system, while Kickstarter backers and bundle purchasers can't directly influence a game's score, they can do so indirectly by voting on other reviews. While they'd still be second-class citizens, as far as the review system is concerned, they'd still be able to have some say, whereas currently they have none.