Sponsored By

A disk crashes mid-project, and you haven’t backed up your source files. You’re localizing a game from another team (or company), but they can't (or won't) provide all of the necessary data files. You are porting a project developed on another platform and don't have the necessary hardware or software tools to edit or convert the files in their native format. No matter what your situation, you don't have many choices; you can cancel the project, recreate the data from scratch, or recover it from the assets you have. Usually, recovery is the preferable option. Tim Trzepacz shares his secrets for handling missing data.

Tim Trzepacz, Blogger

August 3, 2001

42 Min Read

It's obvious when asset recovery is necessary. For instance, a disk crashes while you're mid-project, and lack of proper backups has eaten source files for some of your data. What if you've been put in charge of localizing a game from another team (or another company), but they can't (or won't) find all of the necessary data files for you? Perhaps you are porting a project developed on another platform and don't have the necessary hardware or software tools to edit or convert the files in their native format. Alternatively, you have the source data but you don't have the tools or scripts necessary to convert the data to its in-game format. No matter what your situation, you don't have many choices; you can cancel the project, recreate the data from scratch, or recover it from what you have. Usually, recovery is the preferable option.

I spent over four years as the lead programmer at Working Designs, handling many east-west localization projects. In that time, not once was the project source data we got from Japan anywhere close to complete. Asset recovery was a large part of my job. In this article, I'll share with you my secrets for handling missing data.

What are your goals?

The first thing to determine is what your goals are. Does this screen, that you can't find the bitmap for, really need to change, or can you simply reuse the post-processed binary file? Do you actually need to get the original version of this text file, or will all the data change anyway? Do you need to change every sound in the file, or just the one in the middle?

If you can hack the change in with a hex-editor in an hour, why bother writing tools to extract every file back to it's original form, and then put them back together again? Always remember that the goal is to ship your product, not to have a perfect set of data!

What have you got?

The next step in any recovery effort is to determine what you have to work with. Usually you've got some sort of binary file, which might be the commercially available version of a game to be localized; the CD build you gave to test last week, or what was left in the object directory after the disk crash. You have to be creative when thinking of places your data might be hiding. A lot of times you might have intermediate files laying around, which would be easier to decipher than the final binary data. If you are doing localization work, the data files might be hiding in some data format that you are unfamiliar with.

If you were sent incomplete data as part of a porting or localization kit, it's always good to try to contact the original team and see if they can help you. Maybe they have an older version of the file and you can simply redo the revisions. The best asset recovery is to not have to do it after all.

Once you know what kind of data you do have, the next thing is to look at what resources you have for understanding the format of the data.

If you are localizing a game, you may have documentation in a foreign language, which may or may not be applicable. This is a situation where it is good to use machine translation (MT) software. Even though the translations given by MT software are usually terrible, you can often get an idea of whether the document you are looking at will be helpful or not, before you pay real money to a real translator for a real translation. Good examples of machine translation software include Digital River's Sys Tran (the software that powers Altavista's online Babel Fish translations), and Fujitsu's Translingo and Atlas products. These utilities often do strange things to the formatting of their output, so you'll have to write pre and post processing utilities if you wish to use them for source code comments. Still, if these products save you (or your translator) a few hours (or days) of work, they've paid for themselves!

If you are using utilities and data from another group, don't forget to check to see if they sent you any useful utility software. There may already be a utility to return the binary file to a usable format, or you might have source code for the original conversion program that converted it to that format. If you don't have either of those, at least you have the game source code that reads or processes the file data. Sometimes the easiest thing to do is to drop extra code into the game, which runs on the target that will echo the decompressed data back to a file on the host.

If you are working on a product that is or has a sequel, you might inquire about utilities, data, or documentation from the other versions. They often use a compatible format, a very similar data format, or are backwards/forwards compatible with other versions. Sometimes the source code for the tools have comments in them about what has changed from the previous versions.

Sometimes, with ports and localizations you may have data, which was simply generated by an unknown tool. When I was working on Princess Maker 2, the art was delivered in files with the ".ZIM" extension. Researching on the Internet, I found that this was a format of "Kid98", an art program that exists only on NEC's PC-98 computer line. Source code for a tool that converts ".ZIM" files into the Maki-chan ".MKI" format was found. Although ".MKI" is similarly obscure, we added capacity for good old ".PCX" files to the tool and were back in business! Sometimes, it may even be more economical to buy, borrow, or emulate the strange computer that the data came from so that you can run the native editing software, rather than rewriting it yourself. Also, always check to see if there is a third-party tool that can do the job.

Finally, remember again to always ask the providers of data if it is at all possible. In addition, make sure you tell them the complete situation, rather than just telling the providers your current plan of pursuit for retrieving the data. Their methods of creating the data maybe completely different from what you thought, and if you focus on something strange, they might not understand what it is you really need, especially if you are operating through a translator. Sometimes, your request may go unanswered for a long period of time before your data suddenly arrives!

Know Your Data!

Ok, so you've determined that you really need to change or extract this unknown data, and you currently don't have any tools to do it. Maybe you've looked at the game source and still don't quite understand what is going on in the file, or need confirmation of what you think you know. It might be a good idea to have a look at your data.

The ASCII Dump
The easiest way to look at your data is the ASCII dump. This is most appropriate when your data is supposed to be text, however, even when your data is graphics data you can still discern that there are graphics patterns in it. Many files have some ASCII data in them for block headers and file types. This is usually a good place to start. If your file is small enough, you can load it into "Wordpad", or you can use the "type" command from the DOS prompt. I like to use JP Software's 4DOS command prompt, and their "list" command makes it very easy to browse files as both ASCII and Hex data. I would suggest that you don't edit binary files in your text editor though because many text editors convert linefeeds and OEM characters. Generally, this will corrupt your data if it's not in the same format of ASCII as the editor. Viewing as ASCII text is purely for exploratory purposes.

When viewing the ASCII dump, you are usually looking for one of three things:

  • First, is this an ASCII file, or does it contain an ASCII section? You might be surprised how many programs still parse ASCII data at runtime, especially PC games.

  • Second, does this file have ASCII headers that betray it as a common format? Windows DLLs and executables usually have "This program cannot be run in DOS mode" stuck in right near the top, no matter what they are named. PKZip format files usually start with "PK" as the first two letters. MP3 files may have ID3 ASCII tag data at the end. Windows WAV files generally have "RIFF" as the first four letters. If you see an unfamiliar header on the file, try looking at files that have a familiar format to see if there is a match.

    trzepacz_01.jpg

    Viewing a PKZIP file with Notepad.


    trzepacz_02.jpg

    Viewing a WAV file with 4DOS list (ASCII mode).

Finally, if you are looking at graphics, you may be able to discern patterns in the data that are useful for determining the size and format, especially if you can control how many letters are displayed before the line automatically wraps. For small sprites and font data, you often start to slightly see the images in the ASCII dump. This also works with game map data.

RAW Mode Bitmap Reading in Paint Programs

When you are trying to find graphics data, paint programs like Adobe Photoshop and JASC's Paint Shop Pro can be very helpful. Even if your data isn't in a standard graphics data format, these programs allow you to read a binary file as RAW graphic data using the "Open As" command. However, the methods these programs allow you in interpreting your data, are quite limited. For example, here is the Photoshop 4 dialog for Open As RAW:

trzepacz_03.jpg

Open As RAW Options from Photoshop 4.

As you can see, although we can specify the width and height of our target bitmap, we can't use any bit depth less than 8 bits per pixel, and planar data is non-existent. However, this doesn't mean that these tools are completely useless to us in the case of data that is less than eight bits per pixel. First of all, since most data will be a multiple of eight bits, you can still use the tool to determine the exact file specifications since you can see something, which resembles what you want.

The most important point when doing this kind of analysis is to pick good values for width. You should pick your width based on what you expect the data being extracted to be. If it is a screen image, choose the horizontal resolution of a screen on your system. If it is texture data, try different texture page sizes that are common for your system. If it is tile data, try a single tile width. If it is map data, guess the size of the map. Then set the height to as high a value as your program will allow you to set that keeps you within your file size. When you zoom in on the data, you can scroll up and down to see if its patterns jump out at you. Here is an example of a 256x256, 4 bits per pixel bitmap loaded as 128x256, 8 bits per pixel:

trzepacz_04.jpg

128x256 four bpp bitmap loaded as 8bpp raw single channel.

The garbage near the top left of the image is the image's header information, which is responsible for the right offset of the image. This can easily be skipped by changing the header field. If we are trying to guess the image format and guess incorrectly, we still have clues to help us determine the proper format.

trzepacz_05.jpg

The same loaded at 64x256 eight bpp single channel.

The alternating scan lines and overlaid images in this image show us that we have loaded the data at half its normal horizontal size.

trzepacz_06.jpg

Again, loaded at 256x128 8 bpp single channel.

The doubled image here indicates that we have loaded the data at double its normal size.

trzepacz_07.jpg

Loaded at 130,252

The skew to the left in this image indicates that the width we have chosen is slightly too thin. A skew to the right indicates that it is slightly too wide.

Of course, determining the graphics format is useful, but this data is clearly unusable. However, it is easy to write a tool to take nibble-packed data and convert it to byte-packed data. Then you can convert the file to byte-packed data and read it into your art tool as raw data, which can then be saved in a reasonable format for editing. The advantage of this rather than just writing a raw converter yourself, is that you can scroll through the data in your art tool, it's a lot quicker to code up, and the raw mode of your art program can be set to any width you'd like. The problem is you are still missing the color palette data. Your image is still being imported as a single channel grayscale image. Later, I'll explain how to identify palette data using a hex dump.

At this point, I've only talked about palette images. You can also use the same techniques for RGB data, and it usually works much better as long as you have at least eight bits per channel. In order to actually extract the data, you may need to reorder the RGB bytes in the final output. You can identify RGB data because the image will have vertical stripes in it when viewed as a single channel. Here is an example of 16 bit RGB data viewed as an eight-bit single channel image.

trzepacz_08.jpg

16bit RGB data viewed as an eight-bit single channel image.

 

For planar data and data less than four bpp, you will need to write a program to convert it to byte-packed data in order to do any graphical browsing.

Raw Audio Data Reading in Audio Tools

Most of the techniques I have mentioned so far have been for graphics, but what about audio data. Well, many popular audio programs also support a raw mode for reading and writing audio data. By looking at and listening to the output, you can determine where the audio data is located and what format it is in.

As you can see, there aren't too many options. The header and trailer aren't important when you are just listening to the whole file to find data. However, the Byte order is very important because the data will just sound like static if it is incorrect. Fortunately, you know the ordering for your platform. If the Format is wrong, you can still roughly make out the data if you listen to it, but it's really loud and obnoxious, and will take up the whole amplitude range.

Sample Type and Channels are the most difficult to determine. Fortunately, you probably have some idea of whether your data is mono or stereo. If you do not, there are a few tricks you can try. If 16 bit data is loaded as eight bit stereo, one channel will be static, while the other may look O.K. If 16 bit data is loaded as eight bit mono data, you will see vertical stripes in the sample data upon zooming in.

trzepacz_09.jpg

Many popular audio programs also support a
raw mode for reading and writing audio data.

Loading stereo data as mono has a similar appearance, even if the bit depth is correct. In general, a certain amount of trial and error is required, but if the audio data is stored in a raw format, it can be extracted. If your data is in a compressed format, these techniques generally won't help you.

trzepacz_10.jpg

A signed wav file loaded as unsigned.


trzepacz_11.jpg

16 bit stereo data loaded as eight-bit mono.

The Hex Dump

The best method for observing your data is the venerable hexadecimal dump. There are a number of programs out there that can give you both a hex and ASCII dump. Many development editors and environments support a hex dump display. Visual studio can open binary files as hex data, as can Multiedit and several other editors. I generally use an editing or browsing program. 4DOS's "list" command will display hex if the "X" key is pressed. However, my favorite utility is Breakpoint Software's Hex Workshop, which allows hex and ASCII editing, searches, Unicode support, and many other fine features.

General Rules for Identifying Data
Depending on what kind of data you have, there are a lot of different tricks you can use to identify the data in the hex dump. Of course, it is always important to use what you already know about the data from what source code and documentation you already have, but even when you have next to nothing, you may still be able to find what you need and change it. The important thing to remember is to have a goal in mind. Just staring at a hex dump won't help you unless you are looking for something in particular in the data. If you have expectations of what to find, you can consider the attributes of that data and determine whether the file you are looking at meets or breaks those expectations.

Data Files vs. Memory Structures
Often, you will have the structure definitions for the memory versions of the data, but there may be differences in the data in the file that has to be parsed in order to fill the structures that are in RAM. First off, although structures are dynamically allocated at load time using malloc/new, data in files is often stored as arrays of data like structures. Therefore, you might notice multiple occurrences of similar data in repeating structures. If you can set the width of the columns, this data repetition becomes more apparent.

trzepacz_12.jpg

A hex dump at a width of 32 bytes shows table
entries by repeating data elements in each column.


Another important difference between data files and memory structures is that data, referring to other elements in the file, will use indexes or offsets rather than memory pointers. An index will be from the array start and in units of the structure size, while an offset is from the head of the file, the head of the table, or from some other point. An offset might be in bytes, two byte words, or quad-words. By identifying the locations of tables using the above technique, you can then search for offsets or indices that get you close to those locations in the file. This requires some eagle-eyed attention to the data and some trial and error work with a calculator, but it's not impossible if you know at least a little bit about the data you are trying to decode.

Sometimes, the different types of data are identified by a chunk header ID, usually a 4 letter ASCII identifier that is read as a long word. The ID might be reversed, depending on the byte ordering of the target platform, and the compiler's support for string identifiers, etc. This is where the ASCII dump portion of your hex editor is very useful… the chunk ID is quite easy to spot.

The Rule of Zeros—Finding the Data Size
When you are trying to identify data in a table or structure, there is a useful rule you can use to identify what size a particular data element is. The rule of zeros relies on the fact that the data doesn't actually use the entire range of values available for the number of bytes allotted to it, so the high bytes will often be zero, if the value is negative. So on a system with an intel byte ordering, there will be zeros on the right side of the element, while on a system using the motorola ordering, the zeros will be on the left.

A look at the earlier hex dump example, shows that the first ten values are four byte Intel format values. Following that, we see a table of 32 byte structures that is made up of single byte and double byte values. An exception to the rule of zeros is when a byte or number of bytes is used to store flags where each bit is significant.

Finding Graphics in a Hex Dump
The key to identifying raw graphics data in a hex dump, is the ability to identify repeating data. In palette images eight bpp or less), you have runs of the same color on any given scan line. This is less obvious with dithered, digitized, or rendered data, especially if it is 16, 24, and 32 bit data, but if your image uses transparent colors, you might be able to identify runs of the transparent color.

trzepacz_13.jpg

Hex dump excerpt of a 4 bpp 256x256 image.

In the example above, notice that we have long strings of 0x99 and 0xCC. We can tell that this is four bpp data because the repetition (in this case 9s and Cs) is by nibble. Furthermore, by noting that these are eight lines 16 bytes apart, we can tell that the image is 128 bytes wide, or 256 pixels wide, assuming that these are both part of the same line. We can also discern that the image uses reverse order nibbles because the end of each run always has an out of order value: notice the first block starts with 91 99 and ends with 99 89, which seems very unlikely otherwise.

For an eight bpp bitmap, we can see repeated values for every byte instead of every nibble when there is a horizontal line. In addition, we can see the second scan line emerge 256 bytes into the data for a 256 pixel wide bitmap.

There are similar patterns in 16 bpp and 24 bpp data, but they might not be so apparent. Look for repeating sections of two or three byte codes and try to scan a larger area to determine the length of scan lines. This is where it is better to use the raw mode of your paint program.




trzepacz_14.jpg

Data from an 8 bpp 256x256 bitmap.

Tile maps
The older video game platforms like the Sega Genesis and Super Nintendo made great use of tile maps to efficiently display large sections of scrolling graphics. Although in today's 3D game worlds, you don't see tile maps as frequently, however, they are still used for Gameboy and Gameboy Advance games. Additionally, some games store their terrain as a uniform grid or height map, and the texture selection data is sometimes stored as a tile map.

Often, tile maps are created by automatically ripping out tile-sized sections of a full-sized bitmap, from left-to-right, top-to-bottom. When a tile that matches an already existing tile is found, the code of the existing tile is added, saving a tile definition. Since the process is automatic, one generally ends up with runs of increasing numbers, starting with tile 0. A hex dump of an automatically created tile map might start out like this:

trzepacz_15.jpg

The data is always increasing by one, except when using a previously defined code.

Notice that the data is always increasing by one, except when using a previously defined code. This technique only works with automatically generated tile map data. Of course, if there are bits for H-Flip, V-Flip, color palette, etc. they should be ignored.

trzepacz_16.jpg

The high 5 bits are used for priority, palette, H-flip, and V-flip.

In this example, the high 5 bits are used for priority, palette, H-flip, and V-flip. However, the patterns of tile allocation remain consistent if that extra data is ignored. Otherwise, you can treat a tile map just about the same as a bitmap, using the same techniques described above.

Palette Data
Palette data is usually 16, 24 or 32 bit RGB data. For 32 bit data, the extra byte is usually for alpha channel or other ID bits. In 16 bit data, the high bit is frequently unused, or may be used for transparency information, leaving five bits each for RGB data. Twenty-four bit data is generally all RGB data.

trzepacz_17.jpg

16 bit palette data

This example is 16 bit palette data. The high bit is transparency data. The palette data probably starts at 0x14 because the first palette entry is often all zeros for black transparent data. We know that our platform uses Intel format data, so the high byte is first. The subsequent set of data is for semi-transparent pixels, so we can see that the high bit is set for almost all of the following values. This seems to be a collection of multiple 16 color palettes, because the value 00 00 repeats every 32 bytes (2 bytes per color X 16 colors), meaning that the transparent zero color is repeated.

You may also try calculating some common colors in the correct bit format and doing a binary search. For 16 bit (1.5.5.5) data, pure white is 7FFF (or FF7F in Intel ordering), but that might occur automatically. The next lower shade of gray is 7BDE (DE7B), and the one below that is 739C (9C73), so if you see those numbers in your suspected palette block, you are probably in the right place.

Text Data
It seems like it is easy to find text data, especially if your data is in ASCII. However, for a variety of reasons, an alternative encoding is used. Certainly, some minimal encoding is required if you desire to protect the text from the prying eyes of hackers. On the other hand, if you have an Asian language with tens of thousands of characters, chances are that a custom character ordering is used.

trzepacz_18.jpg

If it's raw ASCII, it's like taking candy from a baby…

For the case of encoded ASCII data, chances are that the encoding is incredibly simple, or the data is compressed and you have little chance of getting it out without providing source for the decompressor. Let's use this simple code as an example. Ninety percent of the time the data is in ASCII order with an offset; the letter 'A', or space could be one. The key to recognizing the data is the frequency of the symbols. You know space will appear regularly every few characters, and that sentences will start with capital letters and end with punctuation. Furthermore, capital letters will be grouped together, lowercase letters will be in a different group, numbers will be sequential, etc.

trzepacz_19.jpg

Here, 1F has been subtracted from each letter.

Another common method for encoding the text is to XOR the letters by a given value. XORing is reversible by XORing again by the same value. Therefore, the same rules of symbol frequency apply, although they may be more difficult to pick out in a hex editor.

Sometimes, especially in the case of games with Asian encoding, the letters will be assigned using a sparse encoding. In this case, they are allocated using the same rules as ripping a tilemap—the symbols are assigned in the order that they appear. The best bet for determining the encoding in that situation is to extract the font data, which should be in the same order as the character encoding.

For Asian languages, it is more likely that the ROM font will actually be used if it is available because there are so many symbols that take up memory. In this case, chances are that one of the standard two-byte encoding's like Shift-JIS, EUC, Big-5, etc. is being used. To check for that, you need a program that displays the proper encoding, such as Hongbo Ni's NJStar, or JWP. Also, if you have Microsoft Office or Internet Explorer installed you can install the far-east encoding option with the appropriate fonts and then open the files in your web browser, trying various encoding. You may need to crop a byte off the start of the file or reverse endian on the text block in order for the double-byte data to be displayed correctly.

I've noticed most console programs from Japan tend to use Shift-JIS-I never see EUC, JIS, or Unicode. Windows games use Shift-JIS for normal text and Unicode for menus and resource text. One useful attribute of either encoding is that the characters that overlap ASCII are easy to decipher. First, in Shift-JIS, normal ASCII characters are allowed—shifted characters begin with a high byte of 0x81 or higher. There are also doublewide ASCII letters available, starting at 0x853F, the second byte being an ASCII value with 0x1F added to it. There is more to it than just that, but you should be able to see something that almost looks like ASCII text where roman letters appear. For Unicode, it is even easier—for ASCII symbols, every code is an ASCII code with 0x00 in the high byte.

CD Images
Sometimes, the data you need has been formatted as CD sectors. CD sectors have a number of different flavors, but raw mode sectors generally have periodic header information every 2336 (0x920) or 2352 (0x930) bytes (or some multiple of 16 close to that). These sectors have the sector number encoded in BCD as part of the header.

trzepacz_20.jpg

CD Sector header. Notice that 0x2DF0 is a multiple of 0x930 (it's 5 times 0x930). So the 5 at the end of that line is part of the BCD encoded sector number.

The purpose of all this, is so that you can determine if you have CD sector data (which may indicate a compressed video or audio file), and also determine which sections of that data to strip out in order to retrieve the real data without gaps.

A CD sector is about 2048 (0x800) bytes, if the headers are omitted (so called "cooked" sectors). In this case, there is usually no additional data per sector. Cooked sectors are ISO 9660 tracks, whereas raw mode sectors are usually CDDA, video, or other track data.

Archive files
On consoles, where all data must be accessed from CD, it is common practice to combine all data into one archive data file with each sub-file aligned on sector size boundaries. In doing so, storing (or reading) the directory can be eliminated and any file can be accessed directly by a single CD seek to a given sector. However, you still have to know the sector number in order to find the data. Sometimes, the sector numbers are hard-coded into the source, but more often than not, there is a simple table loaded that stores the offsets, and possibly the sizes of the data files. Archive files are also used to reduce load times when you want all of the data for a given game level to be loaded into memory at once, requiring no decompression in the process.

Archive files often have the offset data encoded in the first few sectors of the file data, although they are sometimes in a completely separate file. A game is encoded with hard sector offsets, but keeps a separate table for external tool usage. Generally, the files in the offset table are in the same order as in the data file. This means that we can tell if we are looking at offset table data given the fact that the numbers are increasing. They may be packed oddly though, such as one game I worked on used that used 24 bit offset values.

In addition, the units may not be obvious. Even if the data is aligned to CD sectors, the offsets may be in other units. Common units for offset tables are CD sectors, 1k, 16 bytes, four bytes or one byte, however, other units may be possible. There may be other data interleaved with the offset data, such as the file size (again, they may be in different units), file name, attribute data, compression header, etc. The best way to determine what the units are, is to guess and then look at the data to see if it makes sense. Generally, the end of each file is padded with zeros or 0xFF's, so you can easily identify the units when you find the start of a file.

Some files only have size data instead of offset data. In that case, you can't count on the numbers to increase always. You have to do a bit more math to add up the offsets, but by guessing units and checking them against the data, you can find the correct format.

The data file you find may have sub-files in it with their own offset data, sometimes with a completely different format. Just remember to document everything so that you can put the data back together again.

Compressed data
Most of these rules are only useful for uncompressed data. When the data is compressed, your options are considerably limited. Once again, go back to what you have, and look to see if you have source code or tools that you can use. If you have source code for a compressor, you can reverse engineer a de-compressor.

Most companies use proprietary internal tools for data compression, but they will use a third party tool or library for video or audio. Many PC games actually use PKZip for their data compression. Compare your file against file formats that might fit the bill. Check the file extension against possible compressed file types. If you can determine that a commercially available utility produced the data, get that utility—it's probably less expensive than your time.

Putting it back together

Sometimes, extracting the data isn't enough—you need to insert replacement data in its place. Once again, it's important to keep your goals in mind. There is no point in replacing all of the data in an archive if just one file needs to change, but if all of the data needs to change, it's easier to reconstruct it from scratch.

If you only need to change some of the data, and the replacement data is staying the same size, you can take a shortcut by extracting the data before and after the data that needs to change, and then recombining the sections with the new data. Extracting the data can be accomplished with a good hex editor, but I wrote a simple custom utility that performs the same function from the command line, and can be placed in a batch file. Once you have the extracted sections, and your new replacement data file in the correct format, recombining is easy. To recombine, simply use the DOS command line: COPY before.bin+middle.bin+after.bin final.bin. You can also use cut and paste in your hex editor, if it's a one shot you wish to do by hand. However, I like to use batch files in case the data changes again later.

Sometimes you need to generate header table information with size data in it. Write a simple program to combine the files together, but if you are lazy like me, you can just use an assembler. Some assemblers have the ability to include binary files directly. Just insert labels at the beginning and end of each included data file, add the appropriate alignment directives, and create the data tables using pointer math. Then link to a non-executable file and you have a nice binary file. Even if your assembler doesn't support binary includes, you can convert the binary data to ASCII hex format, and then use an include directive to incorporate it into your assembler file. It may sound cumbersome, but it works, and it saves you writing a custom tool to put together just one file.

If all else fails, you can always write a custom tool. Some companies have made special libraries to make this task easier, which isn't such a bad idea if you expect to do this kind of work.

Recommended Tools

Here are a few commercial and shareware tools that I've found useful over the years:

  • Hex Workshop from Breakpoint Software is a great hex editor that supports cut and paste, variable width listing, Motorola and Intel byte ordering, interpretation of data as different size entries, and many other useful features.

  • 4DOS/4NT/4OS2 from JP Software, is an excellent replacement for "command.com". It offers command line completion and history, directory coloring and file comments, a great file list feature that supports hex output, and a whole suite of functions and extended commands for writing batch files.

  • Exe Scope from Toshifumi Yamamoto is a nifty tool that allows you to edit the resource data of windows executables, including graphical dialog editing. You can't beat this tool if you need to change menus for windows programs without the source.

  • Adobe's Photoshop and JASC's Paint Shop Pro are two of the best graphics editing programs—Photoshop is better in general, but is more expensive. Both programs have their strengths.

  • Sonic Foundary's Sound Forge and Syntrillium's Cool Edit are the two best audio editing programs. Sound Forge is very powerful, but Cool Edit is reasonable for most tasks, and is available as a free demo.

  • NJ Star and NJ Communicator from NJ Star Software Corp. If you need to display text in Japanese, Chinese, or Korean, these programs are for you.

Conclusion

Asset recovery will always be an unfortunate but necessary task. However, if you are careful and keep focused on exactly what data needs to be recovered for what purpose, and you make proper use of the resources at hand, the task can be accomplished economically. Management may balk at buying tools for your asset recovery task, but they are almost always worth it! How much is your time as an engineer worth? How much does the delay in releasing your product cost you? Recovery almost always beats abandoning a moneymaking project! Remember, the best recovery is the recovery you don't have to do, so keep good backups of your own data!

 

 

 

Read more about:

Features

About the Author(s)

Tim Trzepacz

Blogger

Tim Trzepacz is a currently a programmer for Insomniac Games, where he is working day and night on their latest PS2 title. Previously, he was the lead programmer for Working Designs, where he worked on the localizations of many Japanese games for Playstation and Sega Saturn, including the Lunar series, Magic Knight Rayearth, Dragonforce, and many others. He also formed Soft Egg Entertainment (www.softegg.com), which was responsible for almost bringing the famous Japanese computer game Princess Maker 2 to the United States. Before that, he worked on the Sega Genesis version of Pirates!Gold and the MS-DOS version of Magic: the Gathering for Micro Prose Software. Tim Trzepacz has a Bachelor of Science degree in Electrical Engineering from Virginia Tech.

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like