On trying to 'epic fail'-proof your game

In this reprint from the November 2010 issue of Game Developer magazine, technical art director Steve Theodore advises on the good intentions but occasional pitfalls of trying to plan for every eventuality.
In this reprint from the November 2010 issue of Game Developer magazine, technical art director Steve Theodore advises on the good intentions but occasional pitfalls of trying to plan for every eventuality -- or in his words, every "epic fail." If you're a working artist, you don't need to tell you things don't always go as planned. Between fickle creative direction, busted builds, and impossible schedules, you've probably had more than your fair share of Fail. For the tech artist, though, that heaping helping of Fail comes with an extra thick dollop of unpleasantness: there's nothing that gives more heartburn to a technical artist than the nasty souffle of inconvenience, production disruption, and sheer embarrassment that is a busted tool. The jury-rigged world of scripts, in-house tools, and ever-evolving features is like an Italian sports car: impressive and sexy, but prone to unpredictable breakdowns. Every TA knows the sad refrain, "But it worked on my machine!" and it's lesser-known cousins, like "It works when you have your units set to centimeters, why do you have your units set to feet?" and "No, I never tested it on filenames with commas in them -- why do you ask?" and the ever popular "Wait, you mean you installed Max onto your K: drive?" TAs live with an astonishing variety of opportunities for things to go off the rails. Tech art relies on many uncontrollable factors, like buggy art software, ever-evolving custom tools, and -- let's face it-- slightly imperfect artists. In this environment, bugs and glitches are inevitable, no matter how slick your scripting skills may be. It's important, therefore, to think ahead to the inevitable problems before they occur. Sure, we'd all prefer it if everything just worked. Unfortunately, as we know too well, even big expensive software packages like Max and Maya aren't 100 percent bug-free. It's a bit naive to think that could be true of the scripts we cook up amidst the craziness of ongoing production. Bugs will happen. The only question for TAs (and their artist clients) is what we're going to do about them. There's an old military maxim to the effect that "failing to plan is planning to fail." To which the veteran tech artist might add "failing to plan to fail is planning to fail epically."

User Fail

Failure comes in many guises, but any tech artist will tell you that the biggest source of failures is us: the users. Any TA who wants to stay sane has to make sure to protect users from themselves. Users make mistakes: Lots of mistakes. Every input, no matter how simple, is a chance for an error. So, when you pop up that dialog box that asks for a new object name, always check right away for obvious boners like empty strings or forbidden characters. Never, ever pass user input along uninspected to another function where it might cause real harm. When you can, present only safe choices, and don't ask completely open-ended questions: A character rigging script that needs the name of a bone should probably present the user with a list of bones to choose from, rather than betting on the typing skills of your average artist. A script that applies materials to selected objects should automatically ignore things like lights and cameras rather than crashing. An animation tool that applies an expression to an object should warn the user if the target is already animated. Most scripters apply these sorts of rules by instinct. A consistent, well-thought out plan is better than instinct, however. Deliberately putting as much intelligence as possible into the user-facing side of your scripts is one of the best ways a TA can stay sane. Adding a couple of lines of validation code to the "Set AI Type" button takes a few minutes: debugging the level that is whited out because your tool accidentally marked a light as an AI character, and it's now following the player everywhere, can take hours to sleuth out -- let alone fix. If your inputs are well-behaved, maintaining the innards of your code is far easier.

Print Fail

Of course, Murphy's Law can't be suspended by a few extra lines of script. A C++ programmer is unlikely to accidentally pass a number to a function that expects a string, because the compiler will catch that mistake in advance. In MaxScript or Python on the other hand, it's all too easy. After user error, passing the wrong arguments between functions is the leading source of bugs. The primeval TA trick for squashing bad value bugs is simply to litter your code with print statements. It's a crude way to track down rogue values and mistaken function calls, but even the most elite TAs resort to it on occasion. Print-based debugging can be made a lot more effective (and less annoying) if you devise a library of reusable debugging functions rather than just plopping prints into your code. Python users should gravitate toward the built-in warn() function and the standard logging module, both of which allow you to control (or staunch!) the flow of debugging info globally or locally. This is a huge step up from having to comment out all those dang prints. MaxScripters aren't so lucky, but a simple library of warning and logging functions, controlled by a global variable so you can turn them off when the job is done, is a great investment any for tech art team. It will also be popular with users who don't want their listener windows polluted by reams of incomprehensible debug text Print debugging is tried and true, but also old and lame: you can only see the variables that you have deliberately chosen to print out. This makes it easy to miss an important clue, or conversely to drown yourself in irrelevant information. Luckily MaxScript and Python both offer primitive debuggers (tough luck, alas, to those of you still writing Mel). Max's debugger and Python's pdb module both score poorly in the ease-of-use sweepstakes, and many TAs give up on the debuggers after some frustrating early experiments. They're both worth a second look, however-for two reasons. First, a live debugger (unlike a print statement) allows you to inspect the entire namespace of a script while it's running. You can check the values of all the variables and also verify where exactly you are in the code you're running. Did that bad value originate inside the function where the error occurred, or did it come in from the code that called the function? What's the stack of functions that got us here in the first place? Secondly and more importantly, debuggers capture the entire state of the code at moment when an error occurs, giving you an accurate idea of what really happened. No need to print out every variable line by line: The crime scene is perfectly preserved for your forensics team. Pythonistas have an even more powerful tool: external debuggers. Several popular Python editors include remote debugging functionality. Using PyDev, Wing or Komodo, you can include a debug terminal in your script which will let you set breakpoints and step through your code one line at a time from outside of Maya with the same level of control and information that the big kids get in Visual Studio. Although this takes a little setup it's an enormous step toward better TA productivity and tools.

Fail Safe

OK, so you've protected yourself from your users and stepped through your code line-by-line to scrub it free of impurities. Guess what? You're still hosed. Script-based tools are always at the mercy of the art packages they live within and the computers they live on. The assumptions you need to make for your code are often going to be wrong: Users will delete or rename things in the scene they really shouldn't touch. Files you need might be missing or locked or in use by some other program. No matter how hard you try, your code is going to run into cases you didn't anticipate. How well you handle these situations has a huge impact on how well your whole toolchain stands up to the stresses of production. It's tempting to try to bulletproof your code with try/catch blocks. Catching exceptions is appealing because it lets you bail out of a problem without crashing -- you can display a warning message, ask for help, or log an error rather than simply dropping dead. Unfortunately try/catches can be a dangerous addiction. Catches suppress errors, but they don't fix them. Unless you carefully design the catching code, you could end up hiding serious problems with your tools rather than confronting them head on. Consider this example: you have a function which collects all the objects that have a certain attribute value. Once in a while you find an object which has the attribute set incorrectly. Say you ask for an RGB color value and get a string like "yellow" instead. The problem is rare and time is short, so what's the harm in wrapping this in a try/ catch ? Well, for starters, you're leaving bad data in the scene, which might break other scripts anyway. You're also lowering your incentive to track down the real source of the problem: without angry users showing up at your desk anymore, it's easy to forget this "solved" issue. Not only that, you're also slowing your code down: exception handling is noticeably slower than ordinary flow control (especially in Max). Finally, and worst of all, you could be inadvertently be hiding other errors-not just your known problem-with bad attributes. Perhaps another tool has a bug which causes it to pass null values or nonexistent objects to the collector function. You'll never know because you're hiding the problem inside your try/catch -- and most likely printing a wrong error message that will confound your debugging efforts. The upshot? Don't be afraid to fail. In this situation, failing is the right thing: fail honestly rather than hiding errors inside try blocks. Fail early and fail often... then find and fix the bugs that show up. This advice may raise some eyebrows among Python coders, who've probably heard of the famous "easier to ask forgiveness than ask permission" principle (i.e. "give it a shot and use try/except to stay out of trouble"). There's an equally important python principle to remember: "catch only appropriate errors." The language encourages you to catch specific exceptions. With targeted catches, truly unexpected flaws are still brought to light, rather than swept under the rug. MaxScripters, unfortunately, have no language-level help in this regard. They're stuck with cumbersome manual testing using the GetCurrentException() function. In both languages, though, the important principles remain the same: use try-catches sparingly, limit them to the smallest practical bits of code, and let problems bubble up so they really get fixed!

Fail Blog

All that said, catches have one very important function that should not be ignored: they are canaries in your personal scripting coalmine. Pretty much by definition, when you hit the catch part of an exception, you know something has gone badly wrong. You should take advantage of that knowledge to collect useful debug information that will help you find and fix the problem. A good error reporting system doesn't have to be complex. A simple text log uploaded onto a network share does wonders for your debugging (for example, when IT calls you up to find out why there are 10,000 files in your dropbox today it's a good clue that something was wrong with yesterdays checkin!). Automatically-generated emails are a great way to enhance your reputation for customer service. Users love it if you're standing over their desk offering to help a few seconds after some scary error dialog appears. For big studios and big TA teams, logging errors to a heavy-duty database can provide really valuable insights into the weak spots in the toolset. If your studio uses a web-based bug tracking system like FogBugz or Jira, you can even submit bugs automatically from right inside your exception handlers. No matter what avenue you choose a strong, standardized debug toolkit is an invaluable help in your struggle with the forces of Fail. Rather than manually collecting things like computer names, software versions, operating system and environment variables in every crash handler, you should create a standard function to collect and package the relevant data and use it everywhere. Not only will this make it easier to parse the data looking for patterns ("Hey wait a sec--all these crashes are on 64 bit machines!") it lets you enrich your debug info as new problems arise. When it turns out that the language settings on your outsourcer's machines are breaking your tools, it's easy to add another couple of lines to your debug info script tool once than to manually check for it in hundreds of different spots.

Tech for Art's Sake

Less technically-inclined artists may look at the some of this stuff and wonder where the art is. In the modern games business, success in the airy realms of artistic expression depends on the ability of a select few technical artists and tools programmers to fight an unending, overwhelming battle against the forces of Fail. It's not a battle that can ever be completely won-but fighting the good fight makes a huge difference to our teams and our games. So get out there and start failing!

Latest Jobs

Sucker Punch Productions

Bellevue, Washington
Combat Designer

Xbox Graphics

Redmond, Washington
Senior Software Engineer: GPU Compilers

Insomniac Games

Burbank, California
Systems Designer

Deep Silver Volition

Champaign, Illinois
Senior Environment Artist
More Jobs   


Register for a
Subscribe to
Follow us

Game Developer Account

Game Developer Newsletter


Register for a

Game Developer Account

Gain full access to resources (events, white paper, webinars, reports, etc)
Single sign-on to all Informa products

Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more