informa
/
Production
Features

Asset Recovery: What to do When the Data is Gone!

A disk crashes mid-project, and you haven’t backed up your source files. You’re localizing a game from another team (or company), but they can't (or won't) provide all of the necessary data files. You are porting a project developed on another platform and don't have the necessary hardware or software tools to edit or convert the files in their native format. No matter what your situation, you don't have many choices; you can cancel the project, recreate the data from scratch, or recover it from the assets you have. Usually, recovery is the preferable option. Tim Trzepacz shares his secrets for handling missing data.

It's obvious when asset recovery is necessary. For instance, a disk crashes while you're mid-project, and lack of proper backups has eaten source files for some of your data. What if you've been put in charge of localizing a game from another team (or another company), but they can't (or won't) find all of the necessary data files for you? Perhaps you are porting a project developed on another platform and don't have the necessary hardware or software tools to edit or convert the files in their native format. Alternatively, you have the source data but you don't have the tools or scripts necessary to convert the data to its in-game format. No matter what your situation, you don't have many choices; you can cancel the project, recreate the data from scratch, or recover it from what you have. Usually, recovery is the preferable option.

I spent over four years as the lead programmer at Working Designs, handling many east-west localization projects. In that time, not once was the project source data we got from Japan anywhere close to complete. Asset recovery was a large part of my job. In this article, I'll share with you my secrets for handling missing data.

What are your goals?

The first thing to determine is what your goals are. Does this screen, that you can't find the bitmap for, really need to change, or can you simply reuse the post-processed binary file? Do you actually need to get the original version of this text file, or will all the data change anyway? Do you need to change every sound in the file, or just the one in the middle?

If you can hack the change in with a hex-editor in an hour, why bother writing tools to extract every file back to it's original form, and then put them back together again? Always remember that the goal is to ship your product, not to have a perfect set of data!

What have you got?

The next step in any recovery effort is to determine what you have to work with. Usually you've got some sort of binary file, which might be the commercially available version of a game to be localized; the CD build you gave to test last week, or what was left in the object directory after the disk crash. You have to be creative when thinking of places your data might be hiding. A lot of times you might have intermediate files laying around, which would be easier to decipher than the final binary data. If you are doing localization work, the data files might be hiding in some data format that you are unfamiliar with.

If you were sent incomplete data as part of a porting or localization kit, it's always good to try to contact the original team and see if they can help you. Maybe they have an older version of the file and you can simply redo the revisions. The best asset recovery is to not have to do it after all.

Once you know what kind of data you do have, the next thing is to look at what resources you have for understanding the format of the data.

If you are localizing a game, you may have documentation in a foreign language, which may or may not be applicable. This is a situation where it is good to use machine translation (MT) software. Even though the translations given by MT software are usually terrible, you can often get an idea of whether the document you are looking at will be helpful or not, before you pay real money to a real translator for a real translation. Good examples of machine translation software include Digital River's Sys Tran (the software that powers Altavista's online Babel Fish translations), and Fujitsu's Translingo and Atlas products. These utilities often do strange things to the formatting of their output, so you'll have to write pre and post processing utilities if you wish to use them for source code comments. Still, if these products save you (or your translator) a few hours (or days) of work, they've paid for themselves!

If you are using utilities and data from another group, don't forget to check to see if they sent you any useful utility software. There may already be a utility to return the binary file to a usable format, or you might have source code for the original conversion program that converted it to that format. If you don't have either of those, at least you have the game source code that reads or processes the file data. Sometimes the easiest thing to do is to drop extra code into the game, which runs on the target that will echo the decompressed data back to a file on the host.

If you are working on a product that is or has a sequel, you might inquire about utilities, data, or documentation from the other versions. They often use a compatible format, a very similar data format, or are backwards/forwards compatible with other versions. Sometimes the source code for the tools have comments in them about what has changed from the previous versions.

Sometimes, with ports and localizations you may have data, which was simply generated by an unknown tool. When I was working on Princess Maker 2, the art was delivered in files with the ".ZIM" extension. Researching on the Internet, I found that this was a format of "Kid98", an art program that exists only on NEC's PC-98 computer line. Source code for a tool that converts ".ZIM" files into the Maki-chan ".MKI" format was found. Although ".MKI" is similarly obscure, we added capacity for good old ".PCX" files to the tool and were back in business! Sometimes, it may even be more economical to buy, borrow, or emulate the strange computer that the data came from so that you can run the native editing software, rather than rewriting it yourself. Also, always check to see if there is a third-party tool that can do the job.

Finally, remember again to always ask the providers of data if it is at all possible. In addition, make sure you tell them the complete situation, rather than just telling the providers your current plan of pursuit for retrieving the data. Their methods of creating the data maybe completely different from what you thought, and if you focus on something strange, they might not understand what it is you really need, especially if you are operating through a translator. Sometimes, your request may go unanswered for a long period of time before your data suddenly arrives!


Know Your Data!

Ok, so you've determined that you really need to change or extract this unknown data, and you currently don't have any tools to do it. Maybe you've looked at the game source and still don't quite understand what is going on in the file, or need confirmation of what you think you know. It might be a good idea to have a look at your data.

The ASCII Dump
The easiest way to look at your data is the ASCII dump. This is most appropriate when your data is supposed to be text, however, even when your data is graphics data you can still discern that there are graphics patterns in it. Many files have some ASCII data in them for block headers and file types. This is usually a good place to start. If your file is small enough, you can load it into "Wordpad", or you can use the "type" command from the DOS prompt. I like to use JP Software's 4DOS command prompt, and their "list" command makes it very easy to browse files as both ASCII and Hex data. I would suggest that you don't edit binary files in your text editor though because many text editors convert linefeeds and OEM characters. Generally, this will corrupt your data if it's not in the same format of ASCII as the editor. Viewing as ASCII text is purely for exploratory purposes.

When viewing the ASCII dump, you are usually looking for one of three things:

  • First, is this an ASCII file, or does it contain an ASCII section? You might be surprised how many programs still parse ASCII data at runtime, especially PC games.
  • Second, does this file have ASCII headers that betray it as a common format? Windows DLLs and executables usually have "This program cannot be run in DOS mode" stuck in right near the top, no matter what they are named. PKZip format files usually start with "PK" as the first two letters. MP3 files may have ID3 ASCII tag data at the end. Windows WAV files generally have "RIFF" as the first four letters. If you see an unfamiliar header on the file, try looking at files that have a familiar format to see if there is a match.

    Viewing a PKZIP file with Notepad.


    Viewing a WAV file with 4DOS list (ASCII mode).

Finally, if you are looking at graphics, you may be able to discern patterns in the data that are useful for determining the size and format, especially if you can control how many letters are displayed before the line automatically wraps. For small sprites and font data, you often start to slightly see the images in the ASCII dump. This also works with game map data.

RAW Mode Bitmap Reading in Paint Programs

When you are trying to find graphics data, paint programs like Adobe Photoshop and JASC's Paint Shop Pro can be very helpful. Even if your data isn't in a standard graphics data format, these programs allow you to read a binary file as RAW graphic data using the "Open As" command. However, the methods these programs allow you in interpreting your data, are quite limited. For example, here is the Photoshop 4 dialog for Open As RAW:

Open As RAW Options from Photoshop 4.

As you can see, although we can specify the width and height of our target bitmap, we can't use any bit depth less than 8 bits per pixel, and planar data is non-existent. However, this doesn't mean that these tools are completely useless to us in the case of data that is less than eight bits per pixel. First of all, since most data will be a multiple of eight bits, you can still use the tool to determine the exact file specifications since you can see something, which resembles what you want.

The most important point when doing this kind of analysis is to pick good values for width. You should pick your width based on what you expect the data being extracted to be. If it is a screen image, choose the horizontal resolution of a screen on your system. If it is texture data, try different texture page sizes that are common for your system. If it is tile data, try a single tile width. If it is map data, guess the size of the map. Then set the height to as high a value as your program will allow you to set that keeps you within your file size. When you zoom in on the data, you can scroll up and down to see if its patterns jump out at you. Here is an example of a 256x256, 4 bits per pixel bitmap loaded as 128x256, 8 bits per pixel:

128x256 four bpp bitmap loaded as 8bpp raw single channel.

The garbage near the top left of the image is the image's header information, which is responsible for the right offset of the image. This can easily be skipped by changing the header field. If we are trying to guess the image format and guess incorrectly, we still have clues to help us determine the proper format.

The same loaded at 64x256 eight bpp single channel.

The alternating scan lines and overlaid images in this image show us that we have loaded the data at half its normal horizontal size.

Again, loaded at 256x128 8 bpp single channel.

The doubled image here indicates that we have loaded the data at double its normal size.

Loaded at 130,252

The skew to the left in this image indicates that the width we have chosen is slightly too thin. A skew to the right indicates that it is slightly too wide.

Of course, determining the graphics format is useful, but this data is clearly unusable. However, it is easy to write a tool to take nibble-packed data and convert it to byte-packed data. Then you can convert the file to byte-packed data and read it into your art tool as raw data, which can then be saved in a reasonable format for editing. The advantage of this rather than just writing a raw converter yourself, is that you can scroll through the data in your art tool, it's a lot quicker to code up, and the raw mode of your art program can be set to any width you'd like. The problem is you are still missing the color palette data. Your image is still being imported as a single channel grayscale image. Later, I'll explain how to identify palette data using a hex dump.

At this point, I've only talked about palette images. You can also use the same techniques for RGB data, and it usually works much better as long as you have at least eight bits per channel. In order to actually extract the data, you may need to reorder the RGB bytes in the final output. You can identify RGB data because the image will have vertical stripes in it when viewed as a single channel. Here is an example of 16 bit RGB data viewed as an eight-bit single channel image.

16bit RGB data viewed as an eight-bit single channel image.

 

For planar data and data less than four bpp, you will need to write a program to convert it to byte-packed data in order to do any graphical browsing.


Raw Audio Data Reading in Audio Tools

Most of the techniques I have mentioned so far have been for graphics, but what about audio data. Well, many popular audio programs also support a raw mode for reading and writing audio data. By looking at and listening to the output, you can determine where the audio data is located and what format it is in.

As you can see, there aren't too many options. The header and trailer aren't important when you are just listening to the whole file to find data. However, the Byte order is very important because the data will just sound like static if it is incorrect. Fortunately, you know the ordering for your platform. If the Format is wrong, you can still roughly make out the data if you listen to it, but it's really loud and obnoxious, and will take up the whole amplitude range.

Sample Type and Channels are the most difficult to determine. Fortunately, you probably have some idea of whether your data is mono or stereo. If you do not, there are a few tricks you can try. If 16 bit data is loaded as eight bit stereo, one channel will be static, while the other may look O.K. If 16 bit data is loaded as eight bit mono data, you will see vertical stripes in the sample data upon zooming in.

Many popular audio programs also support a
raw mode for reading and writing audio data
.

Loading stereo data as mono has a similar appearance, even if the bit depth is correct. In general, a certain amount of trial and error is required, but if the audio data is stored in a raw format, it can be extracted. If your data is in a compressed format, these techniques generally won't help you.

A signed wav file loaded as unsigned.


16 bit stereo data loaded as eight-bit mono.

The Hex Dump

The best method for observing your data is the venerable hexadecimal dump. There are a number of programs out there that can give you both a hex and ASCII dump. Many development editors and environments support a hex dump display. Visual studio can open binary files as hex data, as can Multiedit and several other editors. I generally use an editing or browsing program. 4DOS's "list" command will display hex if the "X" key is pressed. However, my favorite utility is Breakpoint Software's Hex Workshop, which allows hex and ASCII editing, searches, Unicode support, and many other fine features.

General Rules for Identifying Data
Depending on what kind of data you have, there are a lot of different tricks you can use to identify the data in the hex dump. Of course, it is always important to use what you already know about the data from what source code and documentation you already have, but even when you have next to nothing, you may still be able to find what you need and change it. The important thing to remember is to have a goal in mind. Just staring at a hex dump won't help you unless you are looking for something in particular in the data. If you have expectations of what to find, you can consider the attributes of that data and determine whether the file you are looking at meets or breaks those expectations.

Data Files vs. Memory Structures
Often, you will have the structure definitions for the memory versions of the data, but there may be differences in the data in the file that has to be parsed in order to fill the structures that are i

Latest Jobs

Sucker Punch Productions

Bellevue, Washington
08.27.21
Combat Designer

Xbox Graphics

Redmond, Washington
08.27.21
Senior Software Engineer: GPU Compilers

Insomniac Games

Burbank, California
08.27.21
Systems Designer

Deep Silver Volition

Champaign, Illinois
08.27.21
Senior Environment Artist
More Jobs   

CONNECT WITH US

Register for a
Subscribe to
Follow us

Game Developer Account

Game Developer Newsletter

@gamedevdotcom

Register for a

Game Developer Account

Gain full access to resources (events, white paper, webinars, reports, etc)
Single sign-on to all Informa products

Register
Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Subscribe
Follow us

@gamedevdotcom

Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more