In part one I discussed the overview of our tool, MemAnalyze, and how to make a memory dump on Xbox and PS2 consoles. In this part, we will read the memory dump, convert function names to addresses using Map files or PDB files, then we will process the data to create several views on the data. I will also discuss future plans to make MemAnalyze much more powerful.
Converting return addresses to function names
In part one, I chose not to output function names to the memory
dump, but only the return addresses. This means that if you are
you might want to turn off the symbol information and do the converting
yourself, to make it easier to parse the memory dump. When using
symbol information in XbMemdump,
the function addresses are replaced with the names and not added.
We can turn off the symbol information simply by not supplying a
PDB path on the command line.
Before getting into detail on either map or PDB files, I will share a few more details regarding absolute addresses and image- and section base addresses.
The return addresses we store in our memory dump are all absolute addresses.
Image base address
Image base addresses on PCs work differently then on the Xbox. It took me quite some time to figure this out.
On a PC, you can enter a preferred base address for the image (/BASE linker setting), though the operating system can relocate this address when loading the image. The operating system will only relocate the image if there is not enough space to load the image. This only applies when loading DLLs in the same process, not when loading a single executable. For DLLs, the image base address can be tweaked to avoid DLL conflicts and to gain performance.
With this knowledge in mind, I checked the Xbox project settings. There is still the /FIXED setting in the advanced link options. This option specifies if you always want to load the image at the preferred base address. If the operating system cannot load it at that address, the load fails. So the /FIXED option is there, but the /BASE setting is not. I figured the linker would default to some static value. So, the first thing I did was look up the base address in the MAP file that is produced by the linker (listing 1).
Timestamp is 4048a4d1 (Fri Mar 05 17:03:29 2004)
Preferred load address is 00400000
Listing 1: The preferred load address in the map file.
And there it is! Obviously, the preferred load address is 0x400000. Well, not really. There must be some PC legacy involved here. The Xbox image loader works differently, and the 0x400000 value is not used at all. When starting an Xbox application, multiple modules are loaded into the private virtual address space. Other modules, including the kernel and debugging modules, are also loaded. Unlike the PC platform, there is just one process running at a time, so all modules are loaded into the same address space. The image base address is completely determined by the Xbox image loader, which is also why the /BASE setting is not present in the link options. We can retrieve the image base address as described in part one, by calling DmWalkLoadedModules, in the game. An example of the output of DmWalkLoadedModules is shown in listing 2.
--- List of all loaded modules ----
--- End of list ----
Listing 2: Output of all loaded modules, while running my test project 'XboxProject'.
In listing 2, we can also see that the system modules are loaded in the space above 2GB--the shared address space. Our own module is loaded near the beginning of the virtual address space. (The first 0x10000 bytes of the virtual address space is reserved by the operating system.)
Each image consists of a set of sections. These are both DATA and CODE sections. The .text section will contain all your game code. A map file displays a list of all the sections in the image (listing 3).
0001:00000000 00887928H .textbss DATA
0002:00000000 00e2d232H .text CODE
0002:00e2d240 00001eceH .text$x CODE
0002:00e2f110 000e6d24H .text$yc CODE
0002:00f15e40 00039da8H .text$yd CODE
0003:00000000 0000ce65H XMV CODE
0003:0000ce68 00001f03H XMV_RD CODE
0003:0000ed70 0002833cH XMV_RW CODE
0003:000370b0 00002baaH XMV_URW CODE
0004:00000000 0002dc73H XACTENG CODE
0004:0002dc78 000098b6H XACTENG_RD CODE
0004:00037530 000013dfH XACTENG_RW CODE
0004:00038910 000014e7H XACTENG_URW CODE
Listing 3: sections from the map file.
Section addresses are absolute addresses, 32-byte aligned. Figure 1 shows the layout of the virtual memory if a program is loaded into memory.
Figure 1: The status of the virtual memory when an Xbox image is loaded.
Note: the image base address of the game module is never 0x10000, but a couple of hundreds or thousands bytes further in memory (see Listing 2). (This block is not displayed in Figure 1; I have not been able to find out what data resides there.) Also, the start of the first section is never the same as the start of the image base address. In Figure 1, I have displayed this as "Image information". Most likely, this is where the section headers will reside, as the sections themselves are contiguous (mind the section alignment though).
Map files store their functions in relative virtual addresses. The first question that pops up is: "Relative to what?" Well, map files store their functions along with two relative addresses:
- Relative to the image base (the column header says rva+base, but we will see why this is incorrect in a moment).
First, we will discuss section-relative conversion.
If we take a look at a line of text from a map file (listing 4), column 1 displays the section index and the offset into the section. Column 2 displays the function's decorated name and column 3 displays the offset of the function, relative to the image base. The rest is of no use to us.
0002:008bfb00 [email protected][email protected]@@@[email protected]@@
TPointer [email protected]@@[email protected]@PBDABV?
$TShared [email protected] [email protected]@@[email protected]@Z 015325a0 f i
Listing 4: A line of text from the map file.
To compute the absolute address of the function displayed in listing 3, we need the start address of the section. The section index is in front of the section relative address. In listing 4, this is 0002. Unfortunately, the section base addresses are not present in the map file. Don't be thrown off by the list on top of the map file--it does not display the section base addresses. We need to parse the game's executable using ImageBld to obtain section information. We dumped our Xyanide section information on the command line by typing:
ImageBld /dump XyanideD.xbe >XyanideSectionDump.txt
Now we need to parse the section dump (listing 5), find the start address, and add it to the offset of the function. This gives us the absolute start address of the function, but we stored the return addresses! So we first have to compute the size of the function. The map file displays all the functions in order of appearance in memory, so we can simply subtract the address of our function from the address of the next function in the map file.
Listing 5: Layout of a section of the ImageBld.exe tool. We need to parse the 'virtual address' field.
We have now parsed the section dump from ImageBld, but we could also have used a function from the debug library on the Xbox itself: DmWalkModuleSections. We can save the module section addresses in the memory dump from within the game, just as we did by saving the image base.
To convert the decorated function name to a readable name, use
the helper function UnDecorateSymbolName
from the DbgHelp library.
Now we know how to convert a function address to a function name. There are two drawbacks to using section relative conversion if you are using ImageBld to parse the executable:
- If the image base address is relocated by the operating system, the section base addresses are incorrect, because they are provided as absolute addresses. (This code assumes that the default image base is being used.) This has not happened to me yet, but in theory, it is possible.
- We need to parse two files to find the function name.
By using DmWalkModuleSections, you will only have to parse the MAP file, and it has the advantage that the kernel knows if the image base has been relocated, so the correct absolute addresses are written to file.
Another approach is to use the third column of the map file. This is the image-base relative conversion. The header of the column says this is the rva+base address, meaning it is the absolute address of the function name. Well, that is simply not true. This code also uses the preferred base address of 0x400000. We've just seen that this value is not used, and incorrect. However, we can use it to our advantage! By simply subtracting 0x400000 from this value, we get the address relative to the image base. We can work with image base relative values, because they are independent of the location of the image in memory. So, regardless of whether the OS has relocated the base address, we can find our function name. We only need to use the memory dump's runtime image base address. We already saved this value in part one. Thus, this is how you convert the rva+base column to an absolute address:
AbsFunctionAddress = RvaBase - 0x400000 + RealImageBase
Now we get to the final drawback of map files: Xbox map files do not include static functions! This will sometimes result in incorrect function names. They are usually not completely off; the resulting function name will probably be a function of the same object file, so we are pointed in the right direction. But, we could do even better. We can read Program Databases!
Program Databases store the functions in image base relative addresses, so no problems there. The PDB file holds all the information you could possibly think off. If we try hard enough, we can even double-click a callstack item in the tool and display the function and the exact line of code that matches the allocation.
We can parse PDB files using the DbgHelp library and by using the DIA SDK.
Note: The Microsoft DIA SDK is an SDK that parses symbol information from PDB files. I have had a really hard time figuring out how to use the SDK, particularly because there is so little documentation on it. I have asked Microsoft about this, and they told me that version 8.0 of Visual Studio will contain a much more comprehensive documentation.
I have stripped and adjusted the dia2dump sample that comes with the SDK. The code from listing 6 will dump absolute function addresses, function names and sizes, based on a given image base address.
const char *msg )
ULONG celt = 0;
len = 0;
argc, char* argv)
Dump(argv, pSource, NULL, GetImageBaseAddress());
Listing 6: Parsing symbol information using the DIA SDK.
The DbgHelp library, on the other hand, is very easy to use, and its documentation is very comprehensive. Listing 7 does almost the same as the sample code from listing 6, but using the DebugHlp library. Keep in mind that if you are not running on Windows XP, you need the latest DDK DLLs to run this code. These can be found at http://www.microsoft.com/ddk/debugging. For more information on DbgHelp, see the DbgHelp documentation.
// Just for this sample, I use a big buffer to store the
BOOL CALLBACK EnumerateSymbolsProc(
strcat(g_BigBuf, "Function: ");
strcat(g_BigBuf, " Address: ");
strcat(g_BigBuf, " Size: ");
SymSetOptions(SYMOPT_UNDNAME | SYMOPT_DEFERRED_LOADS);
if(SymInitialize(hProcess, NULL, FALSE))
int GetPDBFileSize(const char* fileName)
int size = GetFileSize(hFile, NULL);
int fileSize = GetPDBFileSize(pFileName);
hProcess = GetCurrentProcess();
if(argc != 2)
Listing 7: Parsing symbol information using the DbgHelp library.
In CodeWarrior you can let the linker output an XMap file. This file contains a start address, size, and decorated name per function. Parsing it should not be too difficult. Each return address in our stack trace is matched to all address-size ranges and if it is within the correct range, that name is stored. Listing 8 displays a piece of a CodeWarrior Xmap file.
00100230 00000018 .text Foo3() (main.cpp)
00100250 00000020 .text Foo2() (main.cpp)
00100270 00000020 .text Foo1() (main.cpp)
00100290 00000034 .text main (main.cpp)
Listing 8: A few lines of text that was produced by the CodeWarrior linker for my PS2 Foo project. The first column represents the start address of the function; the second column displays the size. The third and the last columns display the function name and the source file.
PS2 symbol information
Codewarrior uses debug information in the DWARF 1.1 format (Debug With Arbitrary Record Format). For information on the format, please refer to [Ref 1]. The other PS2 compilers, GCC and ProDG, use the ECOFF/STABS debug format. I have no experience using any of them, but I know that there is source code on the web for reading the DWARF format. There is an executable called DwarfDump and an open source library called DwarfLib. For more information, refer to [Ref 2].
The details of MemAnalyze
In this section I like to explain how we will process our platform independent allocation data. I guess you can figure out for yourself how to build a memory layout view, so I will not get into details about that view. The two other views need a little more attention.
The TopX view
More on return addresses
The return addresses we stored in the memory dump are more valuable then you might have thought in the first place. They do not just point at the function that allocated the memory, they point to the instruction within the function that allocated the memory (figure 2). Using this information, we can distinguish between multiple allocations in a function. Do not be tempted to replace the return addresses with function names unless you store the offset of the instruction along with it.
Finding the allocators
Let's take a look again at our list of allocated blocks with their callstacks. Let's forget we have a complete callstack per allocation, and first just focus on the return addresses on top of the callstack: the actual calls to new, XmemAlloc or any other allocation function. We simply need to run over our complete list of blocks and find all the different return addresses from callstack level zero. For all these return addresses, we need to accumulate the total size allocated and the number of allocations performed. Doing so, we have an overview of all allocations, and they can be sorted on allocation address, total size allocated, and the number of allocations.
Figure 2: Foo2 allocates both 'a' and 'b'.
This gives us a great overview of our allocations. However, the return addresses we are looking at are sometimes too deep into system code. It may not be all that interesting to know that D3DAllocContiguousMemory allocated 30 megabytes of memory. It provides us with some information, but we would rather like to zoom out to see who called D3DallocContiguousMemory. This way we could see how much memory is spent on vertex buffers or texture memory, for instance.
For a more global view, we first can collapse the data a bit by sorting on the function that allocated the memory instead of the actual instruction that performed the allocation. This will combine all allocations in the scope of a function.
Theoretically, to zoom out even further, we could sort on a different level in the callstack. Instead of using entry zero, we could sort on entry one or entry two. But, this doesn't make much sense, and I am not even sure what good this information would do. If we want a better overview of our allocations, the hierarchy view is much more elegant, as described later.
The Memory leaks view
When to make a memory dump
We have discussed the comparison of multiple memory dumps. Now we need to decide at what point in the game we will make these dumps. We need to find a situation in the game where the memory allocation state of the game is exactly the same, time over time. The application's exit is one of these places. In our case, and I think this will work for many games out there, the menu is another such place. Each time you re-enter the menu after playing the game, the memory state should be exactly the same. Do not be confused by the fact that the menu will have allocated the items at a different location in memory. The number of allocations that have been performed and the size of the allocations should not differ. If it does, we will have a memory leak.
There is one exception to this rule, and that is the use of memory managers, such as freelists. Freelists may grow due to memory fragmentation. I can tell you that freelists will grow and it will sometimes seem that fragmentation is the cause. Disable the use of freelists to make sure these are really fragmentation issues and not memory leaks.
Before I make my first memory dump, I usually load the level one time and then go back to the menu. The game is likely to perform a couple of one-time initial global allocations. I do not want these to pop up in my memory report.
Finding the leaks
Now we have to come up with information on the memory dumps that actually makes sense. I will discuss one of the algorithms I have used. Because we may need to compare many thousands of blocks, performance is an issue. I leave it up to you to optimize the algorithm.
We have two lists of say, 10,000 blocks of memory, each with a callstack and an allocation size. First we will delete all the blocks that have the exact same size and callstack that are present in memory dump 1 and memory dump 2 (figure 3). This way, only the differences in both dumps will remain. Naturally, you will need to make copies of both lists first or you will destroy your source data.
Figure 3: Deleting matching memory blocks, leaving the differences.
In our figure, that leaves us with block 2 from memory dump 2. If this was an actual situation, we could mark block 2 as a memory leak and display the function name, size, and callstack. However, it is not always this obvious. If we take a look at figure 4, which represents a possible result of our difference algorithm, we can see that the same callstack has allocated more memory in dump 2 than it did in memory dump 1.
Figure 4: A possible difference in memory compare, showing memory growth.
In this case, this callstack allocated the same number of allocations, but the size differs. This is a typical freelist situation, where the freelist has grown. This is still quite straightforward. There are a few other situations, and I'd like to point out one in particular. Figure 5 displays a very odd situation.
Figure 5: A possible difference in memory compare, showing not only memory growth or shrinkage, but also a possible leak.
In this scenario it is hard, if not impossible, to come up with a verdict of what is going on. The callstack has not only allocated more (or less!) memory in size, but has also allocated a greater number of items. This seems like both a memory leak, and memory growth or shrinkage. It is even very hard to tell if it would be growth or shrinkage.
To handle all the situations in the resulted difference list, I count the number of blocks and the total size that was allocated, per callstack. For instance, in figure 5, the callstack 0x00001234, 0x00003456 and 0x00004567 has allocated 1 item in 128 bytes in memory dump 1. It has allocated 2 items in 1536 bytes in memory dump 2. Listing 9 displays all the different scenarios that I have come up with.
NrBlocksMD1 = CountNrBlocks(list1, CurCallStack);
NrBlocksMD2 = CountNrBlocks(list2, CurCallStack);
assert(!(nrBlocks1 == 0 && nrBlocks2 == 0)); // They cannot // both be // zero
TotalSizeMD1 = CountTotalSize(list1, CurCallStack);
TotalSizeMD2 = CountTotalSize(list2, CurCallStack);
Diff = TotalSizeMD2 - TotalSizeMD1;
Listing 9: The different scenarios for our memory dump difference.
Using this code we can iterate over the remaining list of memory dump 1 and compare it to the remaining list of memory dump 2. This time I chose not to remove all the items that were processed, since this can become quite complex to manage. Instead, I have marked all the items that were processed, and skip over them on the next iteration step. After we have compared list 1 to list 2, all the items in list 2 that have not yet been processed are memory leaks! So we need to run over the second list one more time, building a list of all the leaks.
The hierarchy view
An idea that I have not built yet, but would look very cool to me, is a sort of hierarchy view. It looks a lot like a traditional profiler view.
Starting off with the return addresses of callstack level zero, we can zoom out to their parents, and on to their parents. Keep in mind that the parent of a return address from our callstack is the function that performed the allocation, and that the parent of a function is again the return address in the next callstack level (figure 6). You can also decide always to collapse allocations within a function.
Figure 6: The parent - child relationship of return addresses and functions. The instruction marked as one is the root and the function marked as six is the leaf.
Listing 10 shows an example of a hierarchy output.
+D3DAllocContiguousMemory() (16KBytes in 16 allocations, 40% of all allocations)
+CTextureManager::CreateTexture() (6KBytes in 6 allocations, 15%)
-CApplication::LoadScreen() (4KBytes in 4 allocations, 10%)
-CCar::Initialize() (2Kbytes in 2 allocations, 5%)
+CSpecialEffectMgr::CreateVertexBuffer() (10KBytes in 10 allocations, 25%)
-CDynamicTrailActor::Initialize() (4KBytes in 4 allocations, 10%)
-CParticleManager::CreateEmitter() (4KBytes in 4 allocations, 10%)
(2Kbytes in 2 allocations, 5%)
Listing 10. A possible hierarchy view.
The next big step
This version of MemAnalyze uses a memory dump from disk. However, it would be fantastic to expand MemAnalyze to do real-time analysis. I am always interested in what section of what level uses the most memory. We could make a view like Windows' CPU performance window (figure 7).
Figure 7. Real-time analysis of memory statistics. Grabbed from Windows CPU performance window.
We could even track the history of memory mutations and fast forward or rewind our statistics, and do compares on them. Although this sounds very difficult to do, I wonder how much additional work it would cost. We won't even need to worry about intermediate data storage. We just send the allocation data directly to the PC.
On both project Xyanide and Cyclone Circus, we found our fragmentation problems and memory leaks within fifteen minutes after starting the game. We ran MemAnalyze at a regular interval, and it provided us with information on what part of our code allocated less or more memory.
In the end, our Playstation 2 game, Cyclone Circus, never had more then 160K of lost space caused by fragmentation. Returning from the game to the menu gave us the exact same memory layout, with our heap end at exactly the same position, after each race--even after 120 hours of demo mode racing. So these tools have proven to be very useful. It would be a great if the console manufacturers would provide these sorts of tools in the next generation consoles and development tools.
Many thanks to my colleague Tom van Dijck, who deserves all the credit for his PS2 implementation. I would also like to thank the Xbox Developer Support Desk for their professional support.
 Information on the Dwarf 1.1 debug format
 Information on Dwarf debugging format and binaries