So, two things:
Last week, Valve announced Steam Labs, a new initiative where Valve pulls back the curtain on various projects they're working on internally but that aren't quite ready to be rolled out publicly just yet.
Given the timing, I decided to go ahead and release a janky half-finished prototype of a little weekend project I had cooked up called Steam Diving Bell. You can play with it here. Just please don't hug my server to death.
So now that my little project is out there, I'd like to talk a bit about it and Steam Labs in general.
Diving Bell is an experiment meant to address discoverability on Steam. It serves a similar purpose to Steam Labs' Interactive Recommender, which is a really neat machine-learning based recommendation engine you can read all about here. I've tried it myself, and it really works -- it's an incredibly neat piece of tech.
So what the heck is Diving Bell, who is it for, and how is it different -- why would anyone want to use it if we already have the fancy interactive recommender?
All great questions.
What is it?
It's a (prototype) web app for quickly discovering interesting games on Steam.
Who is it for?
Anyone who wants better discovery for games on Steam. This means players (who want to find games), but also developers (who want their games to be found). But let's not forget that curators need good tools, too. Human-powerd curation stands to benefit from better tools that make it easy to quickly find the games you want to showcase and talk about.
How is it different?
Diving Bell and the Interactive Recommender take entirely opposite approaches:
Interactive Recommender uses your play history to get to know you, and uses smart algorithms to serve up games it thinks you will like. You specify a few parameters, and it shows you a list of recommendations. Interactive Recommender is like a sommelier that uses their expertise to suggest a wine that pairs well with the courses you've already chosen.
Diving Bell has no clue who you are or what you like, and uses dumb algorithms to serve up games similar to a title you specify. From there you can browse around in any direction you want. Diving Bell, like its namesake, is a vessel that lets you safely descend into the murky depths to catch glimpses of weird and interesting
The aesthetic I'm going for is "wikipedia binge." You start with some topic, then you click on links within that topic that seem interesting, and before you know it you find yourself following some totally weird but fascinating bunny trail you never expected you'd go down.
Let's start with a guided tour. You can tell Diving Bell to start with a specific game by adding "?appid=XYZ" at the end of the URL (sans quotes), where XYZ is a specific game's Steam application id. Let's start this plunge with Chrono Trigger:
Into the Depths
(There'll be a brief pause at the beginning while it bootstraps and then all subsequent loads should be faster).
Chrono Trigger is our selected game. Diving Bell serves up 8 games that it thinks are similar. There's an information panel (cropped from the screenshot for space purposes) that tells us more about the game, and includes screenshots, trailer, etc, and then there's some navigation on the bottom of the main panel: "Back", "More", and some mysterious blue buttons.
Clicking "More" will serve up another 8 recommendations, while keeping Chrono Trigger centered. At that point clicking "Back" will take us back a step and show us the previous recommendations. As for the blue buttons, these represent recommendation engines and can be individually toggled on and off. Right now all four are selected, and each corresponds to two of the currently visible recommendation results. I'll explain each of the recommendation engines with illustrations below. First, let's turn all four of them off:
These are "Default matches", and they should feel familiar if you've visted Chrono Trigger's Steam page, because I got them by scraping Steam's "More Like This" section.
For every game on Steam, there is a "More Like This" page, and it has exactly 12 games. The explanation Steam offers for how it makes these matches is:
"The tags customers have most frequently applied to CHRONO TRIGGER® have also been applied to these products"
...but I just treat it as a black box. The matches are solid, but tend to be familiar games that are already popular.
The first iteration of Diving Bell used nothing but "More Like This" matches for each game, because the first issue I was attacking was a UX problem: Let's say you want to browse more games like Chrono Trigger, then browse more games like those games, then visit one of those games' store page.
Here's how you do that currently:
- Visit Chrono Trigger's Steam Page.
- Scroll down way below the fold to "More Like This" and click a tiny button that says "See All"
- The page reloads.
- Find a game you seem interested in (Grandia II?) and click it.
- The page reloads.
- Scroll all the way down to "More Like This" and click "See All"
- The page reloads.
- Find a game you seem interested in (The Legend of Heroes: Trails in the Sky?) and click it.
- The page reloads.
That's 4 clicks, 4 full page reloads, and 2 scrolls (4 scrolls if you click a recommendation in the bottom row on the "More Like This" page). In Diving Bell, this same journey takes 2 clicks, zero page reloads, and zero scrolls. Granted, my prototype web app is a total potato and the async requests take longer than I'd like to fill in, but with a real database and some optimization there's no reason those couldn't be nearly instant.
Just by changing the UX I think we've already improved on the browsing experience of finding more games like Chrono Trigger. But there's a problem: the default "More Like This" recommendations are a bit too good.
"Too good?" What? How could that be a problem?
Obviously I'm using "good" a bit facetiously, what I really mean is they're too on-the-nose for the browsing experience I have in mind.
Take a look at Chrono Trigger's 12 default recommendations:
Now compare those to Grandia II's:
There's a ton of overlap, which means even with the improved UX you'll constantly loop back onto things you've already seen, and never get too far from the original game's center of gravity. Maybe that's what some people want (and it should certainly be an option) but a discovery tool meant for general use can do better.
Maybe we can cull results we've already seen? That could work, but we still only have 12 recommendations for each game, and with this much overlap we'll hit dead ends in no time. We need a way to expand the pool.
Here's an idea -- we have this cool network of game connections from these "More Like This" pages, but what if we reverse the direction of the matches?
We've already established that every game on Steam points to 12 other games in its "More Like This" section. But what if instead of looking for the 12 games pointed to by Chrono Trigger, we crawl every single game on Steam and see how many games themselves point to Chrono Trigger as one of their 12 games? Let's call that a "reverse match."
Now instead of 12 games, we have hundreds or more. Now we have to sort them so we can decide which 8 to show first. I went with a tag similarity heuristic which I'll describe later, but all the results are viewable -- the user can click "more" to see the next 8 until all the reverse matches are exhausted.
Whereas default matches favor genre kings, reverse matches favor niche games. That's because every game in a genre tends to point to the genre kings, but the genre kings don't point back to the niche games. This recommender flips that dynamic.
We see Grandia II and The Legend of Heroes: Trails in the Sky (themselves somewhat niche cult classic when compared to Chrono Trigger), but we also seem some great well regarded indie titles like Cthulhu Saves the World, Cosmic Star Heroine, and Epic Battle Fantasy 4. This gives us a much broader network to crawl -- wikipedia binge here we come! Let's click on "Cosmic Star Heroine" and see where that tackes up.
Hmm, here's a problem. Cosmic Star Heroine, despite being a great game with a lot of similarities to Chrono Trigger, only has seven reverse matches. This is because most similar games have already spent their 12 slots on genre kings. Diving Bell will fill in the gaps with Default recommendations, but we still need more fodder for general browsing.
This is where LOOSE matches come in.
Loose matches crawl the "more like this" graph for the selected game twice. We get the 12 default matches, and then we grab each of those games' 12 recommendations for a final list of 144 matches. Then we exclude the original default 12 matches from the results as well as any duplicates. This gives us a list of games that are still pretty similar to the selected game, while adding just enough noise to juice the variety a bit.
Loose seems more middle-of-the-road than Default and Reverse: it gets a good number of matches, but it doesn't exclusively favor big games, nor does it dig too deep to shine light on niche ones.
The amount of games it returns varies too. For indie titles, it returns a lot --
we'll have more than enough titles here for Cosmic Star Heroine. But let's go back to Chrono Trigger for a second:
This is really interesting. Chrono Trigger is only able to give us six unique loose matches! Now there's a good chance this is just a stupid bug, but I also suspect this is at least in part because of how self-referential the "more like this" network is for genre kings. The 12 default matches reference each other to such a strong degree that even after you generate a pool of 144 second-degree matches, you only have 6 unique matches once you've excluded the default 12 and any duplicates. And even if these particular results are just down to a bug, we know from before that there's tons of overlap in big games' loose matches, and therefore less results over all.
This underscores the need for a variety of recommender engines. Each one so far has a different natural strength:
- Default: small number of matches no matter what, favors genre kings
- Reverse: big game = many matches, niche game = few matches, favors niche
- Loose: big game = few matches, niche game = many matches, neutral(ish)
These three recommendation engines alone probably provide enough variety, texture, and depth to the network to give us that wikipedia binge feel we're after. But we're not done yet! There's room for more.
At this point we're leaving the "More Like This" results entirely behind and will generate new recommendation systems from scratch.
This returns 8 games that Diving Bell considers to be similar to the selected title based entirely on their tags. This tends to favor niche games over popular and well rated ones because the only thing it looks at is the tags.
Some context: every game on Steam has a series of tags that describe different aspects of a game. There's some tags for genre like "RPG" and "Action" and "Platformer", some things that seem to describe visuals like "2D", "Pixel Graphics", and even "Beautiful", as well as random nouns and adjectives like "Werewolves" and "Psychedelic." You can see a complete list here.
Tags are a pretty messy system and the first iteration of my tag-based recommendation engine returned awful results. After a few tweaks, I settled on a decent approach and made it completely transparent to the user. Just hover over any game matched by tags and you'll see a breakdown of how it calculates the score.
What I did here was to take all the Steam tags and group them into various categories (RPG and Adventure go under "Genre", Sci-fi and Retro go under "Theme", JRPG under "Subgenre" and so forth). Then when matching games I go through each category and count how many tags the second game has in common with the first in that category. Then I multiply that number by a list of weights -- for instance, I consider a subgenre match more important than a genre match, and the viewpoint and visual categories more important than the "misc" category. Then I add up all those scores and divide by a theoretical perfect score (where every category matches perfectly) to get a percentage.
This classification scheme is completely arbitrary and reflects my own subjective biases about what matters, but it seems to do the trick. I suspect that the mere act of breaking things down and applying some weights is more important than the exact set of categories and weights you choose -- just anything to get you away from comparing two naked lists of tags in a naive way.
This recommender can be hit and miss, dependent as it is on the notoriously mixed quality of the tags placed on any given game. But this recommender is still capable of producing some really solid matches:
I'm torn on whether to actually display the X% match scores on tag results (or just use them internally for ranking), but I think it needs to stay in some form because this matching mode returns a lot of results. It starts by taking a subset of games that have at least one matching tag in a major category and then ranks them all. This can potentially return hundreds or even thousands of results, and after several pages in you're going to get some really weird stuff that's not similar at all. I could just hard cull results below some score threshold, but I prefer to let the user keep exploring and just give them accurate information about how sloppy the current results are. One thing I think I'll change based on feedback is the exact number % I display. Although 68% is a pretty good match score, school has trained us to read this as a "failing" grade, so I might artificially inflate all the scores to compensate.
In summary -- tags don't care about bigness or popularity, they only care about similarity (as defined by tags). Not all games are well-tagged, and the matches can be noisy. But Diving Bell thrives on noisy results, so this is fine!
But there's still room for at least one more recommendation engine.
Hidden Gem Matches
This is my favorite recommendation engine. You might have seen Steam250.com's list of Hidden Gems, or read my article from five years ago proposing such a system. In either case, the idea's the same -- you find games that a) have a low # of total user reviews and b) have a very high user rating. Then, you rank them by a sensible algorithm, adding a penalty to anything with too many user reviews total. What you're left with is a list of extremely well regarded games that haven't gotten much attention -- ie, "hidden gems."
Diving Bell's "Hidden Gem" recommender is derived from the tag recommender, but instead of starting with a pool of games that is basically everything on Steam, I tell it to only consider the top slice of a "hidden gems list." Then I rank the results by their tag similarity to the selected game.
The results are the least on-the-nose matches of the four recommenders, but often the most surprising and delightful. They're at least vaguely similar to the selected game, usually in the same or adjacent genre, and guaranteed to be well regarded titles most people haven't heard of yet. If you like playing cool obscure stuff that hasn't gone big yet, this is the tool for you.
Because this is a derivative of the tag recommender, it shows the same tooltip and score %, which I think is probably the wrong decision. I think it's okay to show the breakdown, but hidden gems by their very nature are going to get lower tag % scores than pure tag matches. I'll probably either remove the % score heading for gems entirely (but keep the tooltip breakdown), or else give hidden gems a bump in their score based on their hidden gem ranking, so they can compete on the same level as tag-based matches. I dunno, we'll see.
Putting it All Together
Okay, turning all four recommenders back on, this is what we see:
The reverse matches and loose matches give us a mix of niche and well known RPGs, and all Japanese to boot -- just like Chrono Trigger. The tag and gem matches give us a mix of Japanese, American, and European indie titles. Clicking more will let us dive deeper into results from our current position, and clicking any specific game will let us branch out in a new direction. Whether we explore broadly by clicking a new game and going off on a bunny trail, or deeply by clicking "more" to page through the current matches, we're sure to find something interesting.
Okay, our crappy prototype is done! Now, let's consider whether it can be gamed, and evaluate its strengths and weaknesses in comparison to the Interactive Recommender.
Can it be gamed?
Possibly. The developer has no direct control over their "more like this" matches, but they do have control over the initial set of tags they put on their game at launch, which directly affects the "Tags" recommender results and indirectly derives "more like this" matches which drive Default, Loose, and Reverse matches. It's possible to pick out some specific super popular game and then give your game the exact same set of tags, so that it shows up as a 100% match. The risk is that if the chosen tags aren't accurate, players who feel misled could refund the game and leave negative reviews. Also, once a game has been out for a while, players will apply their own tags that eventually outweigh the developers'.
If this becomes a problem where everyone pretends to have tags exactly matching Dark Souls, destroying the usefulness of the "Tags" recommender, I'll probably have to add some other heuristic to how I rank tag matches, either throwing in some randomness, or applying a small penalty to games that are on-the-nose matches but have only developer-set tags, similar to how SteamDB applies uncertainty to user rating rankings. Or I could factor in user ratings a bit. But that's another can of worms.
Another way bad actors can try to game the system is by forging user reviews to get on the Hidden Gems list (or any other recommender that cares about user ratings). Steam has put some effort into combatting forged user reviews, but it's a neverending game of cat-and-mouse. Chief among their efforts is the fact that they don't use user reviews as a significant internal signal for surfacing games. In short, even if your game's user rating is super high, it doesn't vault you to the front page the way it might on Amazon or Yelp, where review fraud is rampant. Instead -- and I have this directly from the mouth of Alden Kroll at Valve -- the only value a user rating has in algorithmic discovery is whether a game's rating is positive or not. All positive games get the same lift, all non-positive games don't. That's it. (Incidentally, this is how the current version of Diving Bell uses user ratings for all recommenders except for Hidden Gems: I exclude poorly rated games below a certain threshold from consideration so that it doesn't take forever to generate results).
Now, even without an explicit algorithmic boost there is a concrete benefit to higher ratings because humans who see "Overwhelmingly positive" vs. merely "Positive", are more likely to click the former. This likely has knock on effects on other metrics that The Algorithm(TM) does care about. But t