Mega Crit Games' roguelike deck-builder Slay the Spire has already amassed over 500,000 players since entering Early Access in November, according to SteamSpy, and received almost unanimous praise from critics and players – not bad for a game with just two full-time developers.
It’s even more impressive when you consider its unusual mash-up of genres. Balancing a card game alone is a huge undertaking, and tiny tweaks can bend entire playstyles out of shape. Stir in the random events of a rogue-like and sprinkle on a few items that can flip a run on its head and you’ve got a potential recipe for chaos.
So just how have Seattle-based Mega Crit managed to keep the game feeling so tight? And how do they approach the mammoth task of balancing a game this intricate?
The key, developers Anthony Giovannetti and Casey Yano tell Gamasutra, is player feedback. Specifically, it’s about collating data on every single run and turning that into informed, specific changes to particular cards and enemies.
Even at an early prototype stage, when the game was being tested by Netrunner players, the team created a metric server to track every decision a player made, Giovannetti says.
"There's no way we can intuitively do it all correctly"
“We have so many cards and so many interactions that even though we have a pretty strong card game background there’s no way we can intuitively do it all correctly," he adds. "I said at one point, ‘look, we’re not going to reasonably be able to balance this many cards, we don’t have a team of people to do this’…[so] we took a data-driven approach. I’m just a big fan of data-driven decision making.”
"I said at one point, 'look, we’re not going to reasonably be able to balance this many cards, we don’t have a team of people to do this’…[so] we took a data-driven approach."
Early on, the pair would constantly add cards, first in batches to create a deck archetype, and then individual cards to “sculpt” those play styles. Playtesters enjoyed having new cards added all the time, and some ended up piling thousands of hours into the game, all of which fed into the team’s metric server.
The data told Yano and Giovannetti how often players selected a particular card when it was offered to them during their dungeon crawl, what they chose it over, how often that card appeared in winning decks, and how much damage players using that card – on average – took from a particular enemy.
Acting on that data, Giovannetti says, was not a “mathematical” approach. The team looked for patterns and tried to intuitively decide how to make a card more fun, or less powerful, or more attractive to players. That involved a lot of trial and error.
“We knew there was going to be knock-on effects, if I change one card and it’s part of particularly strategy, that’s going to have other ramification on the cards in that strategy," he says.
“So we’d make the change and see what people thought of it, see what we thought, and we’d keep tweaking the knobs until we settled on a place that was good. We weren’t afraid of throwing things away and starting over with whole archetypes. My outlook was to be more aggressive on making changes during testing – you’d rather make the changes early and see how it works out.”
As the playerbase has boomed over the last few months, so has the amount of data. “In one hour we get more data than we had throughout the whole prototyping [phase]. Our sample sizes are so large now that they’re really accurate,” Giovannetti explains.
Yano tells me that the key to using that data is to ensure it’s all filterable and categorized. It’s also important, he says, to have a specific question in mind when looking at the data that you couldn’t answer just by playing the game.
"The first time we made our metrics, we had three graphs; now we have at least 90."
“There’s lots of filters [and] check boxes, it’s important to have it filterable for a specific thing," says Yano. "The first time we made our metrics, we had three graphs; now we have at least 90.”
And those graphs don’t just sit there, idle; they directly influence the decisions the developers make. Both men naturally have a deep understanding of how the game works, but sometimes they make mistakes, and the data shows them where they’re going wrong.
How often is a card picked, and how often is it a winner?
The two most important metrics, Giovannetti says, are how often a player picks a card when given the choice (too low and it’s “basically not a card in our game at that point”), and how often a card appears in a winning deck (too high and you know that card is overpowered).
The changes the team made to Dual Wield, which lets players duplicate cards, are a good example. It was strong in prototyping, but a lot of its best interactions didn’t make it to Early Access. The pair could see that players didn’t pick it up very often, so they switched its duplication power from the top card of your deck to any card in your hand. From the resultant data, it was very clear that the buff was too strong, and players were able to ride Dual Wield to victory.
“It was totally broken,” Giovannetti says. “You could copy skills and go infinite [being able to kill an enemy in one turn by duplicating particular cards] really easily. Going infinite is the number one thing we try to make really rare. It makes the actual playing of the game trivial. We then tweaked it so it can only copy Skill cards. So it was a change we thought was benign, found out quickly how degenerate it was and then gave it a slight nerf.”
The team takes the same approach to the game’s enemies. For example, the data showed that players with lots of Power cards in their decks were having trouble with one of the game’s final bosses, The Awakened One, who gains strength whenever the player uses a Power card. To stop that, the team reduced the rate at which the boss gained strength, while compensating with an overall damage boost.
By later looking at the numbers the pair could see that more players were beating the boss overall, Yano says. “This is the first boss players fight when they get to the end, so we didn’t want it to be weak.” By making tweaks to the damage, re-testing, and tweaking again, the developers have it at a level they’re happy with – and it’s all thanks to the data.
“Without that, we’d just get people with anecdotal evidence saying this boss is much stronger or weaker now, and it’s completely based on playstyle or other circumstances,” Yano adds.
That’s not to say anecdotal, subjective feedback is useless by any means. The team have a public Discord server that allows players to provide their thoughts using tags like ‘bugs’ or ‘feedback’. A bot collects that information and relays it to the team.
Data alone is not enough
“The numbers are really useful but they’re not telling us how things feel, so we think it’s still really important," says Giovannetti. "Incorporating the non-data is harder, but it is useful. We don’t have a good hard and fast rule [about when to act on it], but a single well-reasoned post is a lot more useful than lots of players saying ‘nerf this’.”
"Any other game I make going forward I’d do something similar, and I’d recommend other indies to use it whenever they can."
Overall, Giovannetti says the fact that the game is a single-player roguelike makes balancing its many cards a lot easier than for most card games. Players are not competing against each other, so decks do not need to be equal, and the team can make changes for entertainment value rather than in the name of pure balance.
Plus, roguelike events add an element of chance that makes the game more replayable and forces players to switch up styles. And the fact that it’s still in Early Access also helps. “If we make a mistake, we’re releasing weekly patches," he adds. "Because things are in flux players can expect that the balance will change."
But the team is only able to react that quickly because of its data-driven approach. “I think that’s been really validated,” Giovannetti concludes. “Any other game I make going forward I’d do something similar, and I’d recommend other indies to use it whenever they can.”