This article is the fourth in a 5-part series.
- Part 1: The Best and the Rest is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 2: Building Effective Teams is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 3: Game Development Factors is available here: (Gamasutra) (BlogSpot) (in Chinese)
- This article is Part 4, and is available here: (Gamasutra) (BlogSpot) (in Chinese).
- Part 5: What Great Teams Do is available here: (Gamasutra) (in Chinese)
- For extended notes on our survey methodology, see our Methodology blog page.
- Our raw survey data (minus confidential info) is now available here if you'd like to verify our results or perform your own analysis.
The Game Outcomes Project team includes Paul Tozour, David Wegbreit, Lucien Parsons, Zhenghua “Z” Yang, NDark Teng, Eric Byron, Julianna Pillemer, Ben Weber, and Karen Buro.
[Editor's Note: The results of the Game Outcomes Project will be addressed at length during GDC 2016 as part of Paul Tozour's talk on "The Game Outcomes Project: How Teamwork, Leadership, and Culture Drive Results."]
The Game Outcomes Project, Part 4: Crunch Makes Games Worse
Extended overtime (“crunch”) is a deeply controversial topic in our industry. Countless studios have undertaken crunch, sometimes extending to mandatory 80-100 hour work weeks for years at a time. If you ask anyone in the industry about crunch, you’re likely to hear opinions stated very strongly and matter-of-factly based on that person’s individual experience.
And yet such opinions are almost invariably put forth with zero reference to any actual data.
If we truly want to analyze the impact of extended overtime in any scientific and objective way, we should start by recognizing that any individual game project must be considered meaningless by itself – it is a single data point, or anecdotal evidence. We can learn absolutely nothing from whether a single successful or unsuccessful game involved crunch or not, because we cannot know how the project might have turned out if the opposite path had been chosen – that is, if a project that crunched had not done so, or if a project that did not employ crunch had decided to use it.
As the saying goes, you can’t prove (or disprove) a counterfactual – you’d need a time machine to actually know how things would have turned out if you’d chosen differently.
Furthermore, there have undeniably been many successful and unsuccessful games created both with and without crunch. So we can’t give crunch the exclusive credit or blame for a particular outcome on a single project when much of the credit or blame is clearly owed to other aspects of the game’s development. To truly measure the effect of crunch, we would need to look at a large sample, ideally involving hundreds of game projects.
Thankfully, the Game Outcomes Project survey has given us exactly that. In previous articles, we discussed the origin of the Game Outcomes Project and our preliminary findings, and our findings related to team effectiveness and many additional factors we looked at specific to game development. We also wrote up a separate blog post describing the technical details of our methodology.
In this article, we present our findings on extended overtime based directly on our survey data.
Attitudes Toward Crunch
Developers have surprisingly divergent attitudes toward the practice of crunch. An interview on gamesindustry.biz quoted well-known industry figures Warren Spector and Jason Rubin:
“Crunch sucks, but if it is seen by the team members as a fair cost of participating in an otherwise fantastic employment experience, if they value ownership of the resulting creative success more than the hardship, if the team feels like long hours of collaboration with close friends is ultimately rewarding, and if they feel fairly compensated, then who are we to tell them otherwise?" asked Rubin.
[…] "Look, I'm sure there have been games made without crunch. I've never worked on one or led one, but I'm sure examples exist. That tells me something about myself and a lot about the business I'm in," said Spector.
[…] "What I'm saying is that games - I'm talking about non-sequels, non-imitative games - are inherently unknowable, unpredictable, unmanageable things. A game development process with no crunch? I'm not sure that's possible unless you're working on a rip-off of another game or a low-ambition sequel.
“[…] Crunch is the result of working with a host of unknown factors in creative mediums. Since game development is always full of unknowns, crunch will always exist in studios that strive for quality […] After 30 years of making games I'm still waiting to find the wizard who can avoid crunch entirely without compromising at a level I'm unwilling to accept.”
On the other side of the fence is Derek Paxton of Stardock, who said in an interview with Gameranx:
“Crunch makes zero sense because it makes games worse. Companies crunch to push through on a specific game, but the long-term effect is that talented developers, artists, producers and designers burn out and leave the industry.
“Companies and individuals should stop wearing their time spent crunching as a badge of honor. Crunch is a symptom of broken management and process. Crunch is the sacrifice of your employees. I would ask them why crunch isn’t an issue with other industries. Why isn’t crunch an issue at all game studios?
“Employees should see it as a failure. Gamers should be concerned about it, because in the long term the hobby they love is losing talent because of it. Companies should do everything in their power to improve their processes to avoid these consequences.”
So who is right – Spector and Rubin, or Paxton?
[Full disclosure: team member Paul Tozour leads Mothership Entertainment, which is partially owned by Stardock.]
In the Game Outcomes Project survey, we provided 3 text boxes at the end that respondents could use to tell us about their industry experiences. Where they mention crunch, they invariably mention it as a net negative. One respondent wrote:
“The biggest issue we had was that the lead said ‘Overtime is part of game development’ and never TRIED to improve. As sleep was lost, motivation dropped and the staff lost hope ... everything fell apart. Hundred-hour weeks for nine months, and I'm not exaggerating. Humans can't function under these conditions ... If you want to mention my answer feel free. I'm sure it'd be familiar to many devs.”
Another developer put it more bluntly:
“Schedule 40 hours a week and you get 38. Schedule 50 and you get 39 and everyone hates work, life, and you. Schedule 60 and you get 32 and wives start demanding you send out resumes. Schedule 80 and you’re [redacted] and get sued, jackass.”
In this article, we will be getting a final word on the subject from the one source that has yet to be interviewed: the data.
The “Extraordinary Effort” Argument
We’ll begin by formulating the “pro-crunch” side of the discourse into testable hypotheses. Although no one directly claims that crunch is good per se, and no one denies that it can have harmful effects, Spector and Rubin clearly make the case in the article above that crunch is often (if not usually, or even always) a necessary evil.
According to this line of thinking, ordinary development with ordinary schedules cannot produce extraordinary results. We believe an accurate characterization of this viewpoint from the gamesindustry.biz article quoted above would be: “Extraordinary results require extraordinary effort, and extraordinary effort demands long hours.”
This position (we’ll call it the “extraordinary effort argument”) leads directly to two falsifiable hypotheses:
1. If the “extraordinary effort argument” is correct, there should be a positive correlation between crunch and game outcomes, and higher levels of crunch should show a measurable improvement in the outcomes of game projects.
2. If the “extraordinary effort argument” is correct, there should be relatively few, if any, highly successful projects without crunch.
Luckily for us, we have data from hundreds of developers who took our survey with no preconceptions as to what the study was designed to test, and which we can use to verify both of these statements. We’ll agree to declare victory for the pro-crunch side if EITHER of these hypotheses remains standing after we put it in the ring with our data set.
Crunching the Numbers
We’ll approach our analysis in several phases, carefully determining what the data does and does not tell us.
Our 2014 survey asked the following five questions related to crunch, which were randomly scattered throughout the survey:
- “I worked a lot of overtime or ‘crunched’ on this project.”
- “I often worked overtime because I was required or felt pressured to.”
- “Our team sometimes seemed to be stuck in a cycle of never-ending crunch / overtime work.”
- “If we worked overtime, I believe it was because studio leaders or producers failed to scope the project properly (e.g. insufficient manpower, deadlines that were too tight, over-promised features).”
- “If I worked overtime, it was only when I volunteered to do so.”
Here’s how the answers to those questions correlate with our aggregate project outcome score (described on our Methodology page). On the horizontal axis, a score of -1.0 is “disagree completely” and a score of +1.0 is “agree completely."
Figure 1. Correlation of each crunch-related question with that project’s actual outcome (aggregate score). Each of the 5 questions is shown, as an animated GIF with a 4-second delay. Only the horizontal axis changes.
The correlations are as follows: -0.24, -0.30, -0.47, -0.36, +0.36 (in the same order listed in the bullet-pointed list above). All five of these correlations have statistical p-values well below 0.001, indicating that they are statistically significant. Note how all the correlations are strongly negative except for the final question, which asked whether crunch was solely voluntary.
“But wait,” a proponent of crunch might say. “Surely that’s only because you’re using a combined score. That score combines the values of questions like ‘this project met its internal goals,’ which are going to give you lower values, because they're subjective fluff. Of course people who are unhappy about crunch are going to give that factor low scores – and that’s going to lower the combined score a lot. It’s a fudge factor, and it’s skewing your results. Throw it out! You should throw away the critical success, delays, and internal goals outcomes and JUST look at return on investment and I bet you’ll see a totally different picture.”
OK, let’s do that:
Figure 2. Correlation of each of the 5 crunch-related questions with that project’s return on investment (ROI). As with Figure 1, each of the 5 questions is shown, as an animated GIF with a 4-second delay. Only the horizontal axis changes. Note that many of the points shown represent multiple coincident points. See our Methodology page for an explanation of the vertical axis scale.
Notice how the lines have essentially the same slopes as in the previous figure. The correlations with ROI are as follows (in the same order): -0.18, -0.26, -0.34, -0.23, and +0.28. All of these correlations have p-values below 0.012.
Still not convinced? Here are the same graphs again, correlated against aggregate reviews / MetaCritic scores.
Figure 3. Correlation of each of the 5 crunch-related questions with the project’s aggregate reviews / MetaCritic score (note that the vertical axis does not represent actual MetaCritic scores but is a normalized representation of the answers to this question; see our Methodology page for more info). As with Figures 1 and 2, each of the 5 questions is shown, as an animated GIF with a 4-second delay. Note that many of the points shown represent multiple coincident points. Only the horizontal axis changes.
The results are essentially identical, and all have p-values under 0.05.
So if our combined score has a negative correlation with ALL our crunch questions except the one about crunch being purely voluntary (which itself does not imply any particular level of crunch), that means that we’ve disproven the first part of the “extraordinary effort argument” – the correlation is clearly negative, not positive.
Now let’s look at the second testable hypothesis of the “extraordinary effort argument.”
In Figure 4 (below), we’re looking at the two most relevant questions related to overall crunch for a project. The vertical axis is the aggregate outcome score, while the horizontal axis represents the scale from “disagree completely” (-1) to “agree completely.” The black lines are trend lines. As you can see, in both cases, higher agreement with each statement corresponds to inferior project outcomes.
Figure 4. The two most relevant questions related to crunch compared to the aggregate project outcome score.
We’ve added horizontal blue and orange lines to both images. The blue line represents a score of 80, which will be our subjective threshold for “very successful” projects. The orange line represents a score of 40, which will be our threshold for “very unsuccessful” projects.
The dots above the blue line tell a clear story: in each case, there were more successful games made without crunch than with crunch.
However, these charts don’t tell the full story by themselves; many of the data points are clustered at the exact same spot, meaning that each dot can actually represent several data points. So a statistical deep-dive is necessary. We’re particularly interested the four corners of the chart – the data points above the blue line on the extreme left and right sides of each chart (below -0.6 and above +0.6 on the horizontal axis) and below the orange line on the left and right sides.
Looking solely at the chart on the top of Figure 4 (“I worked a lot of overtime or ‘crunched’ on this project”), we observed the following pattern. Note that the percentages are given in terms of the total data points in each vertical grouping (under -0.6 or above 0.6 on the horizontal axis).
We can see clearly that a higher percentage of no-crunch projects succeed than fail (17% vs 10%) and a much larger percentage of high-crunch projects fail rather than succeeding (32% vs 13%). Additionally, a higher percentage of the successful projects are no-crunch than high-crunch (17% vs 13%), while a higher percentage of the unsuccessful projects are high-crunch vs no-crunch (32% vs 10%).
Here’s the same chart, but this time looking at the bottom question, “Our team sometimes seemed to be stuck in a cycle of never-ending crunch / overtime work.”
These results are even more remarkable. The respondents that answered “disagree strongly” or “disagree completely” were 2.5 times more likely to be working on very successful projects (23% vs 9%), while the respondents who answered “agree strongly” or “agree completely” were, incredibly, more than 10 times more likely to be on unsuccessful projects than successful ones (41% vs 4%).
Some might object to this way of measuring the responses, as it is an aggregate outcome score which takes internal achievement of the project goals into account – and this is a somewhat subjective measure. What if we looked at return on investment (ROI) alone? Surely that would paint a different picture.
Here is ROI:
Figure 5. The two most relevant questions related to crunch compared to return on investment (ROI).
The first question (top chart) gives us the following results:
The second question (bottom chart) gives us:
These results are essentially equivalent to what we got with Figure 4 -- the probabilities have shifted a little bit but the conclusions haven't changed at all. The same results hold if we look at MetaCritic scores or any of the other outcome factors we investigated.
For further verification, we did a deep-dive statistical analysis of the data in figures 4 and 5, treating the left and right sides of each graph on each figure (all data points < -0.6 and all those > +0.6) as two separate populations and performing a Wilcoxon rank sum test to compare them.
The p-values of all of these are highly statistically significant, with the top two rows having p-values under 0.006 and the bottom two rows with p-values of 0.
It should be clear that our data set contradicts both of the testable hypotheses that we derived from the “extraordinary effort argument.” But before declaring victory for Paxton and the anti-crunch side, let’s take a look at the counter-argument.
The “Crunch Salvage Hypothesis”
The counter-argument goes something like this:
“Your correlation is bogus, because crunch is more likely to happen on projects that are in trouble in the first place. So there’s already an underlying correlation between crunch and struggling projects, and this is skewing your results. You seem to be saying that crunch causes poorer outcomes, but the causality actually works differently – there’s a third, hidden causal factor (“project being in trouble”) that causes both crunch and lower outcomes. And although crunch helps improve the situation, it’s never quite enough to compensate for the problems in the first place, which is why you get the negative correlation.”
This position warrants further investigation. As the Spector/Rubin interview linked above makes clear, there are some developers who are willing to demand crunch even in cases where their projects are not in trouble (“crunch will always exist in studios that strive for quality,” according to Spector), so it’s clear that at least in some cases, crunch is used on projects that are not yet having problems. But the notion that crunch is more likely on struggling projects is entirely plausible.
Let’s test this counter-argument. Let’s assume the causation is not A -> B but C -> (A and B), where “A”=crunch, “B”=poorer project outcomes, and “C” represents some vaguely-defined set of factors representing troubled projects.
We’ll call this the “crunch salvage hypothesis” – the idea that crunch is more likely to be used on projects in trouble, and that this “trouble&rd