informa
/
Business
Featured Blog

The Game Outcomes Project, Part 1: The Best and the Rest

The first in a 5-part series analyzing the results of the Game Outcomes Project survey, which polled hundreds of game developers to determine how teamwork, culture, leadership, and project management contribute to game project success or failure.

This article is the first in a 5-part series.

The Game Outcomes Project team includes Paul Tozour, David Wegbreit, Lucien Parsons, Zhenghua “Z” Yang, NDark Teng, Eric Byron, Julianna Pillemer, Ben Weber, and Karen Buro.

[Editor's Note: The results of the Game Outcomes Project will be addressed at length during GDC 2016 as part of Paul Tozour's talk on "The Game Outcomes Project: How Teamwork, Leadership, and Culture Drive Results."]
 
The Game Outcomes Project, Part 1: The Best and the Rest

What makes the best teams so effective?

Veteran developers who have worked on many different teams often remark that they see vast cultural differences between them.  Some teams seem to run like clockwork, and are able to craft world-class games while apparently staying happy and well-rested.  Other teams struggle mightily and work themselves to the bone in nightmarish overtime and crunch of 80-90 hour weeks for years at a time, or in the worst case, burn themselves out in a chaotic mess.  Some teams are friendly, collaborative, focused, and supportive; others are unfocused and antagonistic.  A few even seem to be hostile working environments or political minefields with enough sniping and backstabbing to put a game of Team Fortress 2 to shame.

What causes the differences between those teams?  What factors separate the best from the rest?

As an industry, are we even trying to figure that out?

Are we even asking the right questions?

These are the kinds of questions that led to the development of the Game Outcomes Project.  In October and November of 2014, our team conducted a large-scale survey of hundreds of game developers.  The survey included roughly 120 questions on teamwork, culture, production, and project management.  We suspected that we could learn more from a side-by-side comparison of many game projects than from any single project by itself, and we were convinced that finding out what great teams do that lesser teams don’t do – and vice versa – could help everyone raise their game.

Our survey was inspired by several of the classic works on team effectiveness.  We began with the 5-factor team effectiveness model described in the book Leading Teams: Setting the Stage for Great Performances.  We also incorporated the 5-factor team effectiveness model from the famous management book The Five Dysfunctions of a Team: A Leadership Fable and the 12-factor model from 12: The Elements of Great Managing, which is derived from aggregate Gallup data from 10 million employee and manager interviews.  We felt certain that at least one of these three models would surely turn out to be relevant to game development in some way.

We also added several categories with questions specific to the game industry that we felt were likely to show interesting differences.

On the second page of the survey, we added a number of more generic background questions.  These asked about team size, project duration, job role, game genre, target platform, financial incentives offered to the team, and the team’s production methodology.

We then faced the broader problem of how to quantitatively measure a game project’s outcome.

Ask any five game developers what constitutes “success,” and you’ll likely get five different answers.  Some developers care only about the bottom line; others care far more about their game’s critical reception.  Small indie developers may regard “success” as simply shipping their first game as designed regardless of revenues or critical reception, while developers working under government contract, free from any market pressures, might define “success” simply as getting it done on time (and we did receive a few such responses in our survey).

Lacking any objective way to define “success,” we decided to quantify the outcome through the lenses of four different kinds of outcomes.  We asked the following four outcome questions, each with a 6-point or 7-point scale:

  • "To the best of your knowledge, what was the game's financial return on investment (ROI)? In other words, what kind of profit or loss did the company developing the game take as a result of publication?"
  • "For the game's primary target platform, was the project ever delayed from its original release date, or was it cancelled?"
  • "What level of critical success did the game achieve?"
  • "Finally, did the game meet its internal goals? In other words, to what extent did the team feel it achieved something at least as good as it was trying to create?"

We hoped that we could correlate the answers to these four outcome questions against all the other questions in the survey to see which input factors had the most actual influence over these four outcomes.  We were somewhat concerned that all of the “noise” in project outcomes (fickle consumer tastes, the moods of game reviewers, the often unpredictable challenges inherent in creating high-quality games, and various acts of God) would make it difficult to find meaningful correlations.  But with enough responses, perhaps the correlations would shine through the inevitable noise.

We then created an aggregate “outcome” value that combined the results of all four of the outcome questions as a broader representation of a game project’s level of success.  This turned out to work nicely, as it correlated very strongly with the results of each of the individual outcome questions.  Our Methodology blog page has a detailed description of how we calculated this aggregate score.

We worked carefully to refine the survey through many iterations, and we solicited responses through forum posts, Gamasutra posts, Twitter, and IGDA mailers.  We received 771 responses, of which 302 were completed, and 273 were related to completed projects that were not cancelled or abandoned in development.

The Results

So what did we find?

In short, a gold mine.  The results were staggering.

More than 85% of our 120 questions showed a statistically significant correlation with our aggregate outcome score, with a p-value under 0.05 (the p-value gives the probability of  observing such data as in our sample if the variables were be truly independent; therefore, a small p-value can be interpreted as evidence against the assumption that the data is independent).  This correlation was moderate or strong in most cases (absolute value > 0.2), and most of the p-values were in fact well below 0.001.  We were even able to develop a linear regression model that showed an astonishing 0.82 correlation with the combined outcome score (shown in Figure 1 below).

Figure 1.  Our linear regression model (horizontal axis) plotted against the composite game outcome score (vertical axis).  The black diagonal line is a best-fit trend line.  273 data points are shown.

To varying extents, all three of the team effectiveness models (Hackman's “Leading Teams” model, Lencioni's “Five Dysfunctions” model, and the Gallup “12” model) proved to correlate strongly with game project outcomes.

We can’t say for certain how many relevant questions we didn’t ask.  There may well be many more questions waiting to be asked that would have shined an even stronger light on the differences between the best teams and the rest.

But the correlations and statistical significance we discovered are strong enough that it’s very clear that we have, at the very least, discovered an excellent partial answer to the question of what makes the best game development teams so successful.

The Game Outcomes Project Series

Due to space constraints, we’ll be releasing our analysis as a series of several articles, with the remaining 3 articles released at 1-week intervals beginning in January 2015.  We’ll leave off detailed discussion of our three team effectiveness models until the second article in our series to allow these topics the thorough analysis they deserve.

This article will focus solely on introducing the survey and combing through the background questions asked on the second survey page.  And although we found relatively few correlations in this part of the survey, the areas where we didn’t find a correlation are just as interesting as the areas where we did.

Project Genre and Platform Target(s)

First, we asked respondents to tell us what genre of game their team had worked on.  Here, the results are all across the board.

Figure 2. Game genre (vertical axis) vs. composite game outcome score (horizontal axis).  Higher data points (green dots) represent more successful projects, as determined by our composite game outcome score.

We see remarkably little correlation between game genre and outcome.  In the few cases where a game genre appears to skew in one direction or another, the sample size is far too small to draw any conclusions, with all but a handful of genres having fewer than 30 responses.

(Note that Figure 2 uses a box-and-whisker plot, as described here).

We also asked a similar question regarding the product’s target platform(s), including responses for desktop (PC or Mac), console (Xbox/PlayStation), mobile, handheld, and/or web/Facebook.  We found no statistically significant results for any of these platforms, nor for the total number of platforms a game targeted.

Project Duration and Team Size

We asked about the total months and years in development; based on this, we were able to calculate each project’s total development time in months:

Figure 3.  Total months in development (horizontal axis) vs game outcome score (vertical).  The black diagonal line is a trend line.

As you can see, there’s a small negative correlation (-0.229, using the Spearman correlation coefficient), and the p-value is 0.003.  This negative correlation is not too surprising, as troubled projects are more likely to be delayed than projects that are going smoothly.

We also asked about the size of the team, both in terms of the average team size and the final team size.  Average team size was between 1 and 500 with an average of 48.6; final team size was between 1 and 600 with an average of 67.9.  Both showed a slight positive correlation with project outcomes, as shown below, but in both cases the p-value is well over 0.1, indicating there’s not enough statistical significance to make this correlation useful or noteworthy.

Note that in both figures below, the horizontal axis is shown on a logarithmic scale, which makes the linear trend line appear curved.

Figure 4.  Average team size correlated against game project outcome (vertical axis).

Figure 5.  Final team size correlated against game project outcome (vertical axis).

We also analyzed the ratio of average to final team size, but we found no meaningful correlations here.

Game Engines

We asked about the technology solution used: whether it was a new engine built from scratch; core technology from a previous version of a similar game or another game in the same series; an in-house / proprietary engine (such as EA Frostbite); or an externally-developed engine (such as Unity, Unreal, or CryEngine).

The results are as follows:

Figure 6. Game engine / core technology used (horizontal axis) vs game project outcome (vertical axis), using a box-and-whisker plot.

 

Average composite score

Standard Deviation

Number of responses

New engine/tech

53.3

18.3

41

Engine from previous version of same or similar game

64.8

15.8

58

Internal/proprietary engine / tech (such as EA Frostbite)

60.7

19.4

46

Licensed game engine (Unreal, Unity, etc.)

55.6

17.5

113

Other

55.5

19.5

15

The results here are less striking the more you look at them.  The highest score was for projects that used an engine from a previous version of the same game or a similar one – but that’s exactly what one would expect to be the case, given that teams in this category clearly already had a head start in production, much of the technical risk had already been stamped out, and there was probably already a veteran team in place that knew how to make that type of game!

We analyzed these results using a Kruskal-Wallis one-way analysis of variance, and we found that this question was only statistically significant on account of that very option (engine from a previous version of the same game or similar), with a p-value of 0.006.  Removing the data points related to this answer category caused the p-value for the remaining categories to shoot up above 0.3.

Our interpretation of the data is that the best option for the game engine depends entirely on the game being made and what options are available for it, and that any one of these options can be the “best” choice given the right set of circumstances.  In other words, the most reasonable conclusion is there is no universally “correct” answer separate from the actual game being made, the team making it, and the circumstances surrounding the game's development.  That’s not to say the choice of engine isn’t terrifically important, but the data clearly shows that there plenty of successes and failures in all categories with only minimal differences in outcomes between them, clearly indicating that each of these four options is entirely viable in some situations.

We also did not ask which specific technology solution a respondent’s dev team was using.  Future versions of the study may include questions on the specific game engine being used (Unity, Unreal, CryEngine, etc.)

Team Experience

We also asked a question on this page regarding the team’s average experience level, along a scale from 1 to 5 (with a ‘1’ indicating less than 2 years of average development experience, and a ‘5’ indicating a team of grizzled game industry veterans with an average of 8 or more years of experience).

Figure 7. Team experience level ranking (horizontal axis, by category listed above) mapped against game outcome score (vertical axis)

Here, we see a correlation of 0.19 (and p-value under 0.001).  Note in particular the complete absence of dots in the upper-left corner (which would indicate wildly successful teams with no experience) and the lower-right corner (which would indicate very experienced teams that failed catastrophically).

So our study clearly confirms the common knowledge in the industry that experienced teams are significantly more likely to succeed.  This is not at all surprising, but it's reassuring that the data makes the point so clearly.  And as much we may all enjoy stories of random individuals with minimal game development experience becoming wildly successful with games developed in just a few days (as with Flappy Bird), our study shows clearly that such cases are extreme outliers. 

Surprise #1: Incentives

This first page of our survey also revealed two major surprises.

The first surprise was financial incentives.  The survey included a question: “Was the team offered any financial incentives tied to the performance of the game, the team, or your performance as individuals?  Select all that apply.”  We offered multiple check boxes to say “yes” or “no” to any combination of financial incentives that were offered to the team.

The correlations are as follows:

Latest Jobs

Sucker Punch Productions

Bellevue, Washington
08.27.21
Combat Designer

Xbox Graphics

Redmond, Washington
08.27.21
Senior Software Engineer: GPU Compilers

Insomniac Games

Burbank, California
08.27.21
Systems Designer

Deep Silver Volition

Champaign, Illinois
08.27.21
Senior Environment Artist
More Jobs   

CONNECT WITH US

Register for a
Subscribe to
Follow us

Game Developer Account

Game Developer Newsletter

@gamedevdotcom

Register for a

Game Developer Account

Gain full access to resources (events, white paper, webinars, reports, etc)
Single sign-on to all Informa products

Register
Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Subscribe
Follow us

@gamedevdotcom

Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more