AB Split Testing Hell or Highwater

Old school game designer encounters the world of AB Split Testing

December 21, 2016

6 Min Read

Introduction

I am not by far the oldest game developer, but I have been doing it long enough to see how technology has vastly changed our approaches to R&D - not in game features, but in how we actually make design decisions, release our products, organize our teams, schedule our projects, communicate our design ideas and implement our games.

I've seen games go from binary rom chips to DVD to streamed content, 2D to 3D to VR, 2 and 5 person teams to 150 person teams and back down to 1 and 2 person teams as we went from 2D PC to 3D console and back to 2D Mobile. I've seen design documents go from storyboards to design bibles to Wikis. I've seen the gradual change from waterfall project management to scrum to kanbans. Through it all, I've seen one particular evolution I want to highlight, because it has been of greater impact to me this year, and I don't mean VR. I mean something far broader and fundamental - how we make design decisions and test them.

Putting the Anal in Analysis

You see, something changed in how we deliver our games that allows us to be more indecisive. This same sloppiness that allows us to miss our original Gold Master date and get another month of development and testing as we work on a release-day patch has evolved to where we can actually target patches to specific users. That is, we can give different groups of users a different experience and thereby decide through data analysis which design idea was better.

In the Web-world, this is called Split Testing, or AB testing. The AB is for "Group A" and "Group B", but there can be any number of groups that players might be assigned to. What is being tested is the design, not the user, with the user drop-off, retention, engagement time, feature usage, etc. being compared between the groups. These experiments can be turned on and off on a server and end-users would not be any wiser unless they compared their experiences.

So anyone working at Zynga or Facebook or Yahoo and maybe Blizzard has probably much more experience with this than I have, for they use this method to constantly refine their product and evolve it. It is the primary means that a live team can maintain the product. Therein is the clue to why this may work for them.

As online products, these apps have a huge shelf-life. They are designed to be relevant and played for years. Their revenue model be it from ads, subscriptions or micro-transactions relies on retention and a regular influx of new users on whom they can conduct these experiments.They have the luxury of time to analyse and over-analyse their decisions to death.

The Cost of AB Testing

Historically, I have made one-off products - games that aimed for a Thanksgiving launch and generated revenue only on the day of the store purchase. Retention was important only so we could retain players for a subsequent release - an add-on or sequel. User input was based on the original release, not on incremental changes. We didn't have the leisure to tweak our game forever. We had things called Feature Lock and Code Freezes.

Now however, through the wonders of the Internet we can unthaw these code locks and continue our blissful tweaking long after the game was originally launched.

I have to tell you, having been a Lead Designer and Executive Producer used to calling the shots, I am not a huge fan of leaving design decisions up to a committee of number crunchers. I learned to trust my gut. I learned to trust my own analysis and research. I also learned to solicit feedback, but also timebox the decision making and have the right person (art director, producer, game designer) make the call. Design by committee has rarely worked well. The vast majority of post-mortems identify the need for a primary vision holder to expedite development. So for someone to tell me that we need to AB test whether the sky should be deeper blue or that a button should be centered or right aligned drives me nuts.

The Snowballing Impact of AB Testing

Fortunately, that's not necessarily the typical AB test. However different the change, every AB test needs to have a programmer implement and a tester test every permutation in the experiment. Combined with the fact that several experiments can be run at once, this can lead to many different configurations to test and track. There is the rub - to AB test or not because it means a lot of effort to implement, test and track all the experiments. For this reason, the hypothesis for the experiment is that the difference must be meaningful.

Unfortunately, this tool also allows people to be indecisive. If a team doesn't know the answer or have faith in their directors or if God-Forbid, they have a design committee that can never make a decision, they may decide to AB test everything. This can worst-case effectively double the development effort.

I am reminded of a scenario in my early MechCommander days where we had planned on having a branching mission tree - the victory or losses of a mission would have a meaningful impact to the story because a whole new branch of missions would open or close to them. Then one day Denny said to us, "Wait, are we really going to implement dozens of missions that a player might never see unless they replayed the game and did better or worse?" That was a lot of extra content to create. We quickly remedied that plan, but it stuck with me. From purely a producer stand point, we were nuts to even consider it. Why spend many man-months of effort and tens of thousands of dollars to create content that only 10 to 20 percent of your audience will ever experience?

That is the argument we can use to push back on AB tests. The old time-quality-cost project management triangle - you can only choose two, and if the AB tests blow out your time or costs, then something else will have to give. Unfortunately, some companies have REALLY DEEP pockets, so it should be no surprise that these companies have a trememdous number of AB tests running and a willingness to completely implement a brand new feature then AB test it until giving it to other users. I guess when you're a megaship with millions of users, you need to make very careful and slow course corrections.

To Be Nimble and Responsive

But then, smaller companies, like mine, see a value in AB testing. This was completely new to me, but I am not dumb. If we can iterate to refine our vision and tweak our product methodically to increase our metric goals, why not? It is an opportunity we shouldn't pass up, any more than the opportunity to patch a bug. The trick is to AB test while remaining agile, small and nimble. It is a contrast in goals - to be fast but also analytical. We need to be decisive to get the product out on time, but also design in a system whereby we can test our improvements and be responsive to user input - primarily with metrics. Since this will be my first foray into it from start to finish on a new project, I suspect I will have a lot to learn and report.

Please comment and share your experience with AB testing. Thanks!

About the Author

Timothy Ryan

Blogger

See more from Timothy Ryan

Related Topics

Related Topics

Recent in More

Related Topics

AB Split Testing Hell or Highwater

About the Author

Latest News

Trending

Featured Blogs

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">AB Split Testing Hell or Highwater</span>AB Split Testing Hell or Highwater

About the Author

Latest News

Trending

Featured Blogs

AB Split Testing Hell or Highwater