Featured Blog | This community-written post highlights the best of what the game industry has to offer. Read more like it on the Game Developer Blogs.
Reverse engineering the binary data format for Star Wars: Yoda Stories
An in-depth technical article in which I revisit my childhood by reverse engineering one of my favorite games, the somewhat obscure Star Wars: Yoda Stories.
Zach Barth is the Creative Director at Zachtronics, the studio behind SpaceChem, Ironclad Tactics, and Infiniminer. If you like obscurely technical things, you'll love our upcoming game Infinifactory!
Background
I don't know why, but I've always gotten a kick out of reverse engineering data files for computer games. Although decompiling a game's code is a challenging task, data files are often much easier to figure out (as they contain lots of highly visible content like text and sprites) and let you mod the game if you're able to figure it out sufficiently.
In 2006, days after the Star Wars: Empire at War demo was released, I published some rudimentary tools for dumping and repacking the game's data files, including a simple mod that let you play as the Empire (which was disabled in the demo). There are barely any traces of it left on the internet, but I managed to score a free Petroglyph t-shirt at the time so I suppose that's something.
Many years before that I owned another Star Wars game called Star Wars: Yoda Stories. It appears to have been fairly obscure and poorly received, but that didn't stop me from playing the crap out of it. As a novice game programmer and die-hard Star Wars fan, I tried very hard to locate the game's resources so that I could make some sort of terrible Star Wars game of my own. Instead, all I managed to find were some sound effects and a small number of sprites that they distributed as icons, as part of a desktop theme.
Fast forward sixteen years to me looking through my ancient CD collection for some old games to play at a 1990's computer themed party. After popping in the CD I immediately spot what is clearly a data file, roughly four megabytes in size, just waiting for me to apply my overpriced college degree and crack it open. Better late than never!
File Structure (Difficulty: Padawan)
I suppose that the only program you need to reverse engineer something like this is a hex editor, although as we'll see later a decompiler, calculator, and strong working knowledge of the target program help as well. I'm a big fan of HxD, so we're going to use that.
If you want to play along at home, here's a link to the game's data file: YODESK.DTA
Time to open this file!
Well, that's definitely not text, or really anything remotely meant to be read by a human, but that's not exactly surprising either. I'm sure that sixteen years ago I opened this very same file in Notepad and quickly closed it, not remotely understanding what I was looking at.
Right off the bat we can see some four-letter ASCII symbols, which look to me like section identifiers. Scrolling further ahead seems to confirm this: SNDS, TILE, CHAR, PUZ2, and many more. The file ends with an ENDF section, which implies that the overall file structure is some kind of hierarchy or list of tagged sections.
The VERS identifier clearly starts a "version" section, which contains the following four bytes: 0x00, 0x02, 0x00, 0x00. My guess is that this is version 2.0 of the file format, as Yoda Stories was actually the successor to an Indiana Jones game that appears to use the same engine. It doesn't matter much, though, as this isn't a very interesting piece of data.
Next up is the STUP (setup?) section, which contains a lot of mysterious data:
There's clearly some kind of pattern here, but even with a thorough knowledge of the game it's not clear what it's for. The bigger question on my mind is: how do we skip it? Although it'd be possible to just assume it's a fixed length section and skip the data, that's probably not the case.
If we look back at the start of the section (the previous screenshot) we'll see that four suspicious bytes follow the STUP identifier: 0x00, 0x44, 0x01, 0x00. If we measure the rest of the data in the section after these four bytes, we'll find that it's exactly 0x00014400 bytes long. A coincidence? I think not!
These four bytes are clearly an unsigned, 32-bit integer that specifies the amount of data that make up the rest of the STUP section. If it looks like the bytes are backward, though, it's because they are: they're stored in "little-endian" order, where the less-significant bytes are stored first, a common convention for x86 and x86-64 processors. If we read this length value, we can then skip the rest of the section despite knowing nothing about the data that is stored within.
Manually reading through binary files, even one as small as 4MB, isn't a very productive way to make progress, so this is a good time to start writing a program that parses the file and reads and/or dumps out data as we figure out how it's encoded. My preferred programming language is C#, so I'm going to use that; assuming the file format isn't totally screwy, I should be able to get a lot of mileage about of the BinaryReader class and get a quick start. Here's the program for what we've figured out so far:
static void Main(string[] args)
{
using (BinaryReader binaryReader = new BinaryReader(File.OpenRead("YODESK.DTA")))
{
bool keepReading = true;
while (keepReading)
{
string section = new string(binaryReader.ReadChars(4));
switch (section)
{
case "VERS":
uint version = binaryReader.ReadUInt32();
break;
case "STUP":
uint length = binaryReader.ReadUInt32();
byte[] data = binaryReader.ReadBytes((int)length);
break;
default:
throw new Exception("Unknown section: " + section);
}
}
}
}
Unfortunately, this only works for the first two sections: as soon as we hit the third section, SNDS, it becomes clear that we need to handle all the cases that will be thrown at us. This ends up being is a pretty common aspect of reverse engineering the file format, as there are many instances of values that are one of many types that require us to understand each possible type that could be encountered. Fortunately, almost all of the sections in the file have a 32-bit unsigned length following the section identifier, which means we can reuse the code from the STUP section to skip over them.
static void Main(string[] args)
{
using (BinaryReader binaryReader = new BinaryReader(File.OpenRead("YODESK.DTA")))
{
bool keepReading = true;
while (keepReading)
{
string section = new string(binaryReader.ReadChars(4));
switch (section)
{
case "VERS":
uint version = binaryReader.ReadUInt32();
break;
case "STUP":
case "SNDS":
case "ZONE":
case "TILE":
case "PUZ2":
case "CHAR":