Sponsored By

Monitoring Your Console's Memory Usage, Part Two

In part 2 of his series on console memory tracking, Jelle van der Beek completes the program and shows how to process the captured data and create graphical views that will help you fix leaks and fragmentation.

Jelle van der Beek, Blogger

April 16, 2004

34 Min Read

In part one I discussed the overview of our tool, MemAnalyze, and how to make a memory dump on Xbox and PS2 consoles. In this part, we will read the memory dump, convert function names to addresses using Map files or PDB files, then we will process the data to create several views on the data. I will also discuss future plans to make MemAnalyze much more powerful.

Converting return addresses to function names


In part one, I chose not to output function names to the memory dump, but only the return addresses. This means that if you are using XbMemdump, you might want to turn off the symbol information and do the converting yourself, to make it easier to parse the memory dump. When using symbol information in XbMemdump, the function addresses are replaced with the names and not added. We can turn off the symbol information simply by not supplying a PDB path on the command line.

Before getting into detail on either map or PDB files, I will share a few more details regarding absolute addresses and image- and section base addresses.

Absolute address

The return addresses we store in our memory dump are all absolute addresses.

Image base address

Image base addresses on PCs work differently then on the Xbox. It took me quite some time to figure this out.

On a PC, you can enter a preferred base address for the image (/BASE linker setting), though the operating system can relocate this address when loading the image. The operating system will only relocate the image if there is not enough space to load the image. This only applies when loading DLLs in the same process, not when loading a single executable. For DLLs, the image base address can be tweaked to avoid DLL conflicts and to gain performance.

With this knowledge in mind, I checked the Xbox project settings. There is still the /FIXED setting in the advanced link options. This option specifies if you always want to load the image at the preferred base address. If the operating system cannot load it at that address, the load fails. So the /FIXED option is there, but the /BASE setting is not. I figured the linker would default to some static value. So, the first thing I did was look up the base address in the MAP file that is produced by the linker (listing 1).


Timestamp is 4048a4d1 (Fri Mar 05 17:03:29 2004)

Preferred load address is 00400000

Start Length Name Class0001:00000000 00887928H .textbss DATA

Listing 1: The preferred load address in the map file.

And there it is! Obviously, the preferred load address is 0x400000. Well, not really. There must be some PC legacy involved here. The Xbox image loader works differently, and the 0x400000 value is not used at all. When starting an Xbox application, multiple modules are loaded into the private virtual address space. Other modules, including the kernel and debugging modules, are also loaded. Unlike the PC platform, there is just one process running at a time, so all modules are loaded into the same address space. The image base address is completely determined by the Xbox image loader, which is also why the /BASE setting is not present in the link options. We can retrieve the image base address as described in part one, by calling DmWalkLoadedModules, in the game. An example of the output of DmWalkLoadedModules is shown in listing 2.

--- List of all loaded modules ---- Name: xboxkrnl.exeBaseAddress:80010000Size: 1268224Name: xbdm.dllBaseAddress:B0011000Size: 401056Name: vx.dxtBaseAddress:B00B2000Size: 101184Name: XboxProject.exeBaseAddress:00010CE0Size: 285376--- End of list ----

Listing 2: Output of all loaded modules, while running my test project 'XboxProject'.

In listing 2, we can also see that the system modules are loaded in the space above 2GB--the shared address space. Our own module is loaded near the beginning of the virtual address space. (The first 0x10000 bytes of the virtual address space is reserved by the operating system.)

Section address

Each image consists of a set of sections. These are both DATA and CODE sections. The .text section will contain all your game code. A map file displays a list of all the sections in the image (listing 3).

0001:00000000 00887928H .textbss DATA0002:00000000 00e2d232H .text CODE0002:00e2d240 00001eceH .text$x CODE0002:00e2f110 000e6d24H .text$yc CODE0002:00f15e40 00039da8H .text$yd CODE0003:00000000 0000ce65H XMV CODE0003:0000ce68 00001f03H XMV_RD CODE0003:0000ed70 0002833cH XMV_RW CODE0003:000370b0 00002baaH XMV_URW CODE0004:00000000 0002dc73H XACTENG CODE0004:0002dc78 000098b6H XACTENG_RD CODE0004:00037530 000013dfH XACTENG_RW CODE0004:00038910 000014e7H XACTENG_URW CODE

Listing 3: sections from the map file.

Section addresses are absolute addresses, 32-byte aligned. Figure 1 shows the layout of the virtual memory if a program is loaded into memory.

Note: the image base address of the game module is never 0x10000, but a couple of hundreds or thousands bytes further in memory (see Listing 2). (This block is not displayed in Figure 1; I have not been able to find out what data resides there.) Also, the start of the first section is never the same as the start of the image base address. In Figure 1, I have displayed this as "Image information". Most likely, this is where the section headers will reside, as the sections themselves are contiguous (mind the section alignment though).

Map files

Map files store their functions in relative virtual addresses. The first question that pops up is: "Relative to what?" Well, map files store their functions along with two relative addresses:

  • Section-relative.

  • Relative to the image base (the column header says rva+base, but we will see why this is incorrect in a moment).

First, we will discuss section-relative conversion.

If we take a look at a line of text from a map file (listing 4), column 1 displays the section index and the offset into the section. Column 2 displays the function's decorated name and column 3 displays the offset of the function, relative to the image base. The rest is of no use to us.

0002:008bfb00 ??$Write@V?$TSharedPtr@UIWorldAttachmentProvider@@@GFW@@@TPointer Converter@GFW@@SAXAAUIPropertySaver@1@PBDABV?$TShared Ptr@UIWorld AttachmentProvider@@@1@@Z 015325a0 f i _WaveSpawner.obj

Listing 4: A line of text from the map file.

To compute the absolute address of the function displayed in listing 3, we need the start address of the section. The section index is in front of the section relative address. In listing 4, this is 0002. Unfortunately, the section base addresses are not present in the map file. Don't be thrown off by the list on top of the map file--it does not display the section base addresses. We need to parse the game's executable using ImageBld to obtain section information. We dumped our Xyanide section information on the command line by typing:

ImageBld /dump XyanideD.xbe >XyanideSectionDump.txt

Now we need to parse the section dump (listing 5), find the start address, and add it to the offset of the function. This gives us the absolute start address of the function, but we stored the return addresses! So we first have to compute the size of the function. The map file displays all the functions in order of appearance in memory, so we can simply subtract the address of our function from the address of the next function in the map file.

SECTION HEADER #2 .text884140 virtual addressF21C18 virtual size2000 file pointer to raw dataF21C18 size of raw data000107D2 head shared page reference count address000107D4 tail shared page reference count address6 flagsPreloadExecutable

Listing 5: Layout of a section of the ImageBld.exe tool. We need to parse the 'virtual address' field.

We have now parsed the section dump from ImageBld, but we could also have used a function from the debug library on the Xbox itself: DmWalkModuleSections. We can save the module section addresses in the memory dump from within the game, just as we did by saving the image base.

To convert the decorated function name to a readable name, use the helper function UnDecorateSymbolName from the DbgHelp library.
Now we know how to convert a function address to a function name. There are two drawbacks to using section relative conversion if you are using ImageBld to parse the executable:

  • If the image base address is relocated by the operating system, the section base addresses are incorrect, because they are provided as absolute addresses. (This code assumes that the default image base is being used.) This has not happened to me yet, but in theory, it is possible.

  • We need to parse two files to find the function name.

By using DmWalkModuleSections, you will only have to parse the MAP file, and it has the advantage that the kernel knows if the image base has been relocated, so the correct absolute addresses are written to file.

Another approach is to use the third column of the map file. This is the image-base relative conversion. The header of the column says this is the rva+base address, meaning it is the absolute address of the function name. Well, that is simply not true. This code also uses the preferred base address of 0x400000. We've just seen that this value is not used, and incorrect. However, we can use it to our advantage! By simply subtracting 0x400000 from this value, we get the address relative to the image base. We can work with image base relative values, because they are independent of the location of the image in memory. So, regardless of whether the OS has relocated the base address, we can find our function name. We only need to use the memory dump's runtime image base address. We already saved this value in part one. Thus, this is how you convert the rva+base column to an absolute address:

AbsFunctionAddress = RvaBase - 0x400000 + RealImageBase

Now we get to the final drawback of map files: Xbox map files do not include static functions! This will sometimes result in incorrect function names. They are usually not completely off; the resulting function name will probably be a function of the same object file, so we are pointed in the right direction. But, we could do even better. We can read Program Databases!

PDB files

Program Databases store the functions in image base relative addresses, so no problems there. The PDB file holds all the information you could possibly think off. If we try hard enough, we can even double-click a callstack item in the tool and display the function and the exact line of code that matches the allocation.

We can parse PDB files using the DbgHelp library and by using the DIA SDK.

Note: The Microsoft DIA SDK is an SDK that parses symbol information from PDB files. I have had a really hard time figuring out how to use the SDK, particularly because there is so little documentation on it. I have asked Microsoft about this, and they told me that version 8.0 of Visual Studio will contain a much more comprehensive documentation.

I have stripped and adjusted the dia2dump sample that comes with the SDK. The code from listing 6 will dump absolute function addresses, function names and sizes, based on a given image base address.

#include "stdafx.h"#include "diacreate.h"#include "cvconst.h"

CComPtr psession;CComPtr pglobal;

void Fatal( const char *msg ){printf( msg );printf( "\n" );exit(-1);}

void Dump(char* szFilename,IDiaDataSource* pSource,wchar_t* szLookup,DWORD ImageBaseAddress){HRESULT hr;wchar_t wszFilename[_MAX_PATH];mbstowcs(wszFilename, szFilename, sizeof(wszFilename) /sizeof( wszFilename[0]));

if(FAILED(pSource->loadDataFromPdb(wszFilename))) {if(FAILED(pSource->loadDataForExe(wszFilename, NULL, NULL))) {Fatal( "loadDataFromPdb/Exe" );}}

if(FAILED(pSource->openSession(&psession))) {Fatal("openSession");}if(FAILED(psession->get_globalScope(&pglobal))){Fatal("get_globalScope");}DWORD id = 0;pglobal->get_symIndexId(&id);if(id == 0){Fatal( "get_indexId" );}


ULONG celt = 0;

CComPtr pEnum;CComPtr pSymbol;

pglobal->findChildren(SymTagFunction,NULL,nsfCaseInsensitive|nsfUndecoratedName,&pEnum);while(SUCCEEDED(hr = pEnum->Next(1, &pSymbol, &celt)) && celt == 1){BSTR name;if(pSymbol->get_name(&name) != S_OK){Fatal("get_name");}printf("Function: %ws\n", name);

ULONGLONG address;pSymbol->get_virtualAddress(&address);if(address == 0) {printf(" could not get address!\n");}printf(" Address: %08X\n", address);

ULONGLONG len = 0;pSymbol->get_length(&len);if(len == 0) {printf(" could not get length!\n");}printf(" Length: %ld\n", len);

pSymbol = 0;}}

DWORD GetImageBaseAddress(){// I've hardcoded the image base address here. Replace // this by proper code that reads the image base address // from the memory dump.return 0x10CE0;}

int main(int argc, char* argv[]){if(argc < 2) {printf( "usage: %s \n", argv[0] );return -1;}

HRESULT hr;hr = CoInitialize(NULL);if (FAILED(hr)){Fatal("CoInitialize failed\n");}

CComPtr pSource;// Initialize The Component Object Module Library// Obtain Access To The Providerhr = CoCreateInstance( CLSID_DiaSource, NULL, CLSCTX_INPROC_SERVER, __uuidof( IDiaDataSource ), (void **) &pSource);

if (FAILED(hr)){ Fatal("Could not CoCreate CLSID_DiaSource. Register msdia71.dll." );}

Dump(argv[1], pSource, NULL, GetImageBaseAddress());

pglobal = 0;psession = 0;pSource = 0;


return 0;}

Listing 6: Parsing symbol information using the DIA SDK.

The DbgHelp library, on the other hand, is very easy to use, and its documentation is very comprehensive. Listing 7 does almost the same as the sample code from listing 6, but using the DebugHlp library. Keep in mind that if you are not running on Windows XP, you need the latest DDK DLLs to run this code. These can be found at http://www.microsoft.com/ddk/debugging. For more information on DbgHelp, see the DbgHelp documentation.


// Just for this sample, I use a big buffer to store the// text in (no testing for boundaries!)char g_BigBuf[1024*1024*4] = { '\0' };

DWORD64 GetImageBaseAddress(){// Again, hardcoded image base address. Rewrite this to// proper image base code.return 0x10CE0;}

BOOL CALLBACK EnumerateSymbolsProc(PSYMBOL_INFO pSymInfo,ULONG SymbolSize,PVOID UserContext){if(pSymInfo->Flags | SYMFLAG_FUNCTION){char intbuf[64];

strcat(g_BigBuf, "Function: ");strcat(g_BigBuf, pSymInfo->Name);

strcat(g_BigBuf, " Address: ");_itoa(pSymInfo->Address, intbuf, 16);strcat(g_BigBuf, intbuf);

strcat(g_BigBuf, " Size: ");_itoa(SymbolSize, intbuf, 10);strcat(g_BigBuf, intbuf);

strcat(g_BigBuf, "\n");}

return TRUE;}

bool SetupDbgHelp(HANDLE hProcess){bool bResult = false;


if(SymInitialize(hProcess, NULL, FALSE)){bResult = true;}

return bResult;}

int GetPDBFileSize(const char* fileName){HANDLE hFile = CreateFile(fileName,GENERIC_READ,0,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);

int size = GetFileSize(hFile, NULL);CloseHandle(hFile);return size;}

bool OpenPDB(HANDLE hProcess,const char* pFileName){bool bResult = false;DWORD64 dwBaseAddr = GetImageBaseAddress();

int fileSize = GetPDBFileSize(pFileName);

if(SymLoadModule64(hProcess,NULL,pFileName,NULL,dwBaseAddr,fileSize)){if(SymEnumSymbols(hProcess,dwBaseAddr,"",EnumerateSymbolsProc,NULL)){printf(g_BigBuf);bResult = true;}SymUnloadModule64(hProcess, dwBaseAddr);}

return bResult;}

bool ParsePDB(const char* pFileName){HANDLE hProcess;bool bResult = false;

hProcess = GetCurrentProcess();

if(SetupDbgHelp(hProcess)){if(OpenPDB(hProcess, pFileName)){bResult = true;}


return bResult;}

int main(int argc,char* argv[]){int retCode = -1;

if(argc != 2){printf("Usage: ParsePDB_DbgHelp \n");}else{if(ParsePDB(argv[1])){retCode = 0;}else{DWORD error = GetLastError();printf("\nDbgHelp returned error : %d\n", error);}}

return retCode;}

Listing 7: Parsing symbol information using the DbgHelp library.


Map files

In CodeWarrior you can let the linker output an XMap file. This file contains a start address, size, and decorated name per function. Parsing it should not be too difficult. Each return address in our stack trace is matched to all address-size ranges and if it is within the correct range, that name is stored. Listing 8 displays a piece of a CodeWarrior Xmap file.

00100230 00000018 .text Foo3() (main.cpp)00100250 00000020 .text Foo2() (main.cpp)00100270 00000020 .text Foo1() (main.cpp)00100290 00000034 .text main (main.cpp)

Listing 8: A few lines of text that was produced by the CodeWarrior linker for my PS2 Foo project. The first column represents the start address of the function; the second column displays the size. The third and the last columns display the function name and the source file.

PS2 symbol information

Codewarrior uses debug information in the DWARF 1.1 format (Debug With Arbitrary Record Format). For information on the format, please refer to [Ref 1]. The other PS2 compilers, GCC and ProDG, use the ECOFF/STABS debug format. I have no experience using any of them, but I know that there is source code on the web for reading the DWARF format. There is an executable called DwarfDump and an open source library called DwarfLib. For more information, refer to [Ref 2].

The details of MemAnalyze

In this section I like to explain how we will process our platform independent allocation data. I guess you can figure out for yourself how to build a memory layout view, so I will not get into details about that view. The two other views need a little more attention.

The TopX view

More on return addresses

The return addresses we stored in the memory dump are more valuable then you might have thought in the first place. They do not just point at the function that allocated the memory, they point to the instruction within the function that allocated the memory (figure 2). Using this information, we can distinguish between multiple allocations in a function. Do not be tempted to replace the return addresses with function names unless you store the offset of the instruction along with it.

Finding the allocators

Let's take a look again at our list of allocated blocks with their callstacks. Let's forget we have a complete callstack per allocation, and first just focus on the return addresses on top of the callstack: the actual calls to new, XmemAlloc or any other allocation function. We simply need to run over our complete list of blocks and find all the different return addresses from callstack level zero. For all these return addresses, we need to accumulate the total size allocated and the number of allocations performed. Doing so, we have an overview of all allocations, and they can be sorted on allocation address, total size allocated, and the number of allocations.

This gives us a great overview of our allocations. However, the return addresses we are looking at are sometimes too deep into system code. It may not be all that interesting to know that D3DAllocContiguousMemory allocated 30 megabytes of memory. It provides us with some information, but we would rather like to zoom out to see who called D3DallocContiguousMemory. This way we could see how much memory is spent on vertex buffers or texture memory, for instance.

Zooming out

For a more global view, we first can collapse the data a bit by sorting on the function that allocated the memory instead of the actual instruction that performed the allocation. This will combine all allocations in the scope of a function.

Theoretically, to zoom out even further, we could sort on a different level in the callstack. Instead of using entry zero, we could sort on entry one or entry two. But, this doesn't make much sense, and I am not even sure what good this information would do. If we want a better overview of our allocations, the hierarchy view is much more elegant, as described later.

The Memory leaks view

When to make a memory dump

We have discussed the comparison of multiple memory dumps. Now we need to decide at what point in the game we will make these dumps. We need to find a situation in the game where the memory allocation state of the game is exactly the same, time over time. The application's exit is one of these places. In our case, and I think this will work for many games out there, the menu is another such place. Each time you re-enter the menu after playing the game, the memory state should be exactly the same. Do not be confused by the fact that the menu will have allocated the items at a different location in memory. The number of allocations that have been performed and the size of the allocations should not differ. If it does, we will have a memory leak.

There is one exception to this rule, and that is the use of memory managers, such as freelists. Freelists may grow due to memory fragmentation. I can tell you that freelists will grow and it will sometimes seem that fragmentation is the cause. Disable the use of freelists to make sure these are really fragmentation issues and not memory leaks.

Before I make my first memory dump, I usually load the level one time and then go back to the menu. The game is likely to perform a couple of one-time initial global allocations. I do not want these to pop up in my memory report.

Finding the leaks

Now we have to come up with information on the memory dumps that actually makes sense. I will discuss one of the algorithms I have used. Because we may need to compare many thousands of blocks, performance is an issue. I leave it up to you to optimize the algorithm.

We have two lists of say, 10,000 blocks of memory, each with a callstack and an allocation size. First we will delete all the blocks that have the exact same size and callstack that are present in memory dump 1 and memory dump 2 (figure 3). This way, only the differences in both dumps will remain. Naturally, you will need to make copies of both lists first or you will destroy your source data.

In our figure, that leaves us with block 2 from memory dump 2. If this was an actual situation, we could mark block 2 as a memory leak and display the function name, size, and callstack. However, it is not always this obvious. If we take a look at figure 4, which represents a possible result of our difference algorithm, we can see that the same callstack has allocated more memory in dump 2 than it did in memory dump 1.

In this case, this callstack allocated the same number of allocations, but the size differs. This is a typical freelist situation, where the freelist has grown. This is still quite straightforward. There are a few other situations, and I'd like to point out one in particular. Figure 5 displays a very odd situation.

In this scenario it is hard, if not impossible, to come up with a verdict of what is going on. The callstack has not only allocated more (or less!) memory in size, but has also allocated a greater number of items. This seems like both a memory leak, and memory growth or shrinkage. It is even very hard to tell if it would be growth or shrinkage.

To handle all the situations in the resulted difference list, I count the number of blocks and the total size that was allocated, per callstack. For instance, in figure 5, the callstack 0x00001234, 0x00003456 and 0x00004567 has allocated 1 item in 128 bytes in memory dump 1. It has allocated 2 items in 1536 bytes in memory dump 2. Listing 9 displays all the different scenarios that I have come up with.

NrBlocksMD1 = CountNrBlocks(list1, CurCallStack);NrBlocksMD2 = CountNrBlocks(list2, CurCallStack);assert(!(nrBlocks1 == 0 && nrBlocks2 == 0)); // They cannot // both be // zeroTotalSizeMD1 = CountTotalSize(list1, CurCallStack);TotalSizeMD2 = CountTotalSize(list2, CurCallStack);Diff = TotalSizeMD2 - TotalSizeMD1;

if(NrBlocksMD1 == NrBlocksMD2){if(Diff >0){// Dump2 allocated more memory (growth) in same // number of allocations}else{// Dump2 allocated less memory (shrank) in same // number of allocations}}else{// One of them has to be zero, else we have a very odd // situation (as in figure 5)if(NrBlocksMD1 != 0 && NrBlocksMD2 != 0){if(NrBlocksMD2 >NrBlocksMD1){// Dump2 Leak and grow/shrank}else{// Dump1 Leak and grow/shrank}}else if(NrBlocksMD1 == 0){// Dump 2 has leaked}else{// Dump 1 has leaked (weird)}}

Listing 9: The different scenarios for our memory dump difference.

Using this code we can iterate over the remaining list of memory dump 1 and compare it to the remaining list of memory dump 2. This time I chose not to remove all the items that were processed, since this can become quite complex to manage. Instead, I have marked all the items that were processed, and skip over them on the next iteration step. After we have compared list 1 to list 2, all the items in list 2 that have not yet been processed are memory leaks! So we need to run over the second list one more time, building a list of all the leaks.

The hierarchy view

An idea that I have not built yet, but would look very cool to me, is a sort of hierarchy view. It looks a lot like a traditional profiler view.

Starting off with the return addresses of callstack level zero, we can zoom out to their parents, and on to their parents. Keep in mind that the parent of a return address from our callstack is the function that performed the allocation, and that the parent of a function is again the return address in the next callstack level (figure 6). You can also decide always to collapse allocations within a function.

Listing 10 shows an example of a hierarchy output.

+D3DAllocContiguousMemory() (16KBytes in 16 allocations, 40% of all allocations)

+CTextureManager::CreateTexture() (6KBytes in 6 allocations, 15%)

-CApplication::LoadScreen() (4KBytes in 4 allocations, 10%)

-CCar::Initialize() (2Kbytes in 2 allocations, 5%)

+CSpecialEffectMgr::CreateVertexBuffer() (10KBytes in 10 allocations, 25%)

-CDynamicTrailActor::Initialize() (4KBytes in 4 allocations, 10%)

-CParticleManager::CreateEmitter() (4KBytes in 4 allocations, 10%)

-Coverlay::CreateItem() (2Kbytes in 2 allocations, 5%)

Listing 10. A possible hierarchy view.

The next big step

This version of MemAnalyze uses a memory dump from disk. However, it would be fantastic to expand MemAnalyze to do real-time analysis. I am always interested in what section of what level uses the most memory. We could make a view like Windows' CPU performance window (figure 7).

We could even track the history of memory mutations and fast forward or rewind our statistics, and do compares on them. Although this sounds very difficult to do, I wonder how much additional work it would cost. We won't even need to worry about intermediate data storage. We just send the allocation data directly to the PC.

Wrap up

On both project Xyanide and Cyclone Circus, we found our fragmentation problems and memory leaks within fifteen minutes after starting the game. We ran MemAnalyze at a regular interval, and it provided us with information on what part of our code allocated less or more memory.

In the end, our Playstation 2 game, Cyclone Circus, never had more then 160K of lost space caused by fragmentation. Returning from the game to the menu gave us the exact same memory layout, with our heap end at exactly the same position, after each race--even after 120 hours of demo mode racing. So these tools have proven to be very useful. It would be a great if the console manufacturers would provide these sorts of tools in the next generation consoles and development tools.


Many thanks to my colleague Tom van Dijck, who deserves all the credit for his PS2 implementation. I would also like to thank the Xbox Developer Support Desk for their professional support.


[1] Information on the Dwarf 1.1 debug format

[2] Information on Dwarf debugging format and binaries


Read more about:


About the Author(s)

Jelle van der Beek


Jelle van der Beek has worked in the games industry since 1997. He has developed games for PC, PS2, PS3 and Xbox. He is currently lead programmer for W!Games, a young Dutch company located in Amsterdam, where he is working towards the launch of a Wii title. Jelle likes to get feedback about his articles, so send any messages to [email protected].

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like