[Continuing his two-part article on data alignment, originally printed in Game Developer magazine, game programming veteran Llopis explains how to align objects on the heap, and use this effectively for game coding.]
Last month we looked at data alignment, how to align static variables, and how to allocate aligned memory on the heap through a simple custom allocator. It's a good start, but we need more than that to be able to use alignment effectively in a game.
This month, I'll wrap up this topic by looking at how to allocate any object on the heap with a given alignment and explore allocations on the stack.
Be warned that this is going to take some digging into the dark, dusty corners of the C++ language. So sit comfortably by the fireplace, grab your trusty copy of the C++ Standard, and let's get started.
Aligning Objects on the Heap
The main problem with aligned_malloc is that it simply allocates a block of data, and we often want to allocate full instances of classes or structs with a particular alignment.
In a pinch, we can turn to the underused and unloved placement new. Placement new works just like the regular new you're used to, but it takes an extra parameter with the memory address where the object will be instantiated. Placement new won't allocate the memory for you-it will just create an object of the right type where you tell it to and call its constructor.
Combining placement new and aligned_malloc, you could create an aligned waypoint object on the heap. See Listing 1 below.
Listing 1: Aligned Waypoint Object Created on the Heap
void* buffer = aligned_malloc(sizeof(Waypoint), 16);
Waypoint* waypoint = new (buffer) Waypoint();
It works, but that's some ugly code, not the kind of thing you want to see all over your codebase. Not only are we responsible for keeping around the aligned pointer so we can free it ourselves, but we have to call the object's destructor by hand.
A cleaner way to do it would be to create custom operator new and operator delete functions for the class. Those operators would be responsible for doing all the bookkeeping behind the scenes: hanging onto the pointer, calling the destructor, and freeing the memory with the correct function. If we implement this approach with our Waypoint class, Listing 1 would be reduced to:
Waypoint* waypoint = new (aligned_malloc(sizeof(Waypoint), 16))) Waypoint();
That code is better, but we can simplify it further. Instead of passing the pre-allocated memory, we can simply pass the desired alignment, and let the custom operator new call aligned_malloc.
To free memory correctly, the corresponding operator delete also needs to call aligned_free. The allocation code is now:
Waypoint* waypoint = new (16) Waypoint();
That's much cleaner!
As we saw last month, memory allocated with aligned_malloc needs to be freed with aligned_free. Attempting to use the wrong free function on a block of memory will result in a heap corruption (and possibly some really hard to track down bugs).
That means that operator delete needs to know how the object was allocated and decide which free function to call. Since operator delete can't easily access the contents of the object it's creating, we would have to allocate extra space and set some flags around the memory that was just allocated, indicating which function was used.
Alternatively, a simple solution is to overwrite the default operator new and call aligned_malloc from it as well, but without any alignment. That way all memory is allocated through aligned_malloc and operator delete can safely call aligned_free for all objects.
Since creating all those operators is just a bunch of busy work, we can wrap them in a macro called DYNAMIC_ALIGNMENT. The advantage of the macro over a base class implementing those functions is that we can use the macro in any class or structure without requiring any inheritance or changing the data layout in any way.
Since the classes we want to align are often fairly small, low-level ones, that's a particularly important consideration. (See the source code for the full implementation).
Finding the Alignment
You still need to be very careful with the alignment of member variables. If you remember from last month, the C runtime will align a statically allocated structure based on the alignment properties of its member variables. However, if we allocate it on the heap, it's up to us to ensure its correct alignment.
When you dynamically allocate an object of the Waypoint class without specifying an alignment, the matrix member variable is not guaranteed to be on a 16-byte boundary. To make sure the matrix is aligned correctly, you need to remember to allocate the whole Waypoint object on a 16-byte boundary.
Having duplicate information in multiple places in the code is a common source of errors, so if we've already tagged a structure with a particular alignment for static allocation, it would be ideal to use it as the default alignment on the heap.
We can do that by querying the required static alignment for the data type we're about to allocate. Visual Studio provides the __alignof() operator and gcc has the equivalent __alignof__(), which returns the static alignment for any data type.
We use this alignment as the default value when an object tagged with the macro DYNAMIC_ALIGNMENT is created and no other alignment has been specified. (See the source code for more details.)
The Array Problem
At first glance, it seems that dealing with aligned arrays is just a matter of implementing operator new to return an aligned block of memory. That's almost true, but with a few nasty details thrown in.
In the abstract, an array is a data structure holding a group of elements that can be accessed by its index. There are many different ways in which such a data structure could be implemented, from very complex and abstract, to very simple and barebones ones.
Some programming languages take the high route and implement arrays as a full-blow data structure, hiding the implementation details and providing extra checks to make sure that no element outside the current bounds is accessed. As you can imagine, that's not how they're implemented in C.
That's not a bad thing, though. By keeping to a minimalistic implementation, C provides top performance at the expense of some flexibility and safety checks. That mentality is a result of the hardware available when C was created, back when 10MHz was a state-of-the-art mainframe. It also means that even in this age of machines with gigaflops to spare, we can choose to make maximum use of the hardware resources.
C's philosophy has always been that you only pay for what you use, and if you want something more complex, you can always build it on top yourself (or grab someone else's implementation like std::vector).
In C, an array is a memory block that contains all elements stored contiguously in sequential order. That means you can access a particular element through its index by adding the index times the size of the element to the beginning of the array. What's more, if the index you're accessing is a constant, the calculation can even be done at compile time so there's virtually no overhead at runtime.
That's all very clever and efficient, until we have to deal with alignment. Array elements are laid out sequentially, without any empty space in between. If the struct in the array is a size that is a multiple of the alignment we want, then we're in luck. We can just align the beginning of the array and everything will work fine. But if that struct has a different size, then no matter how we align the array, some of the elements are going to have the wrong alignment.
There are a variety of ways to deal with array alignments, some more complex than others. However, there is one method that is easy, fast, and has no overhead: Set the desired static alignment for the structure.
That will force the compiler to pad the structure to a multiple of the alignment you requested. All elements in the array will still be sequentially laid out, but they will start at the correct alignment as long as we align the beginning of the array correctly with aligned_malloc.
The only downside of this approach is that it can lead to wasted space, because it will align and pad all structures of that type in the game, not just the ones that need to be aligned in the array. Similarly, if the same structure needs to be aligned on one boundary for one array, and on a different boundary in another array, you need to pad it to the least common multiple of the two alignments, meaning even more wasted space.
To be fair, that's a very uncommon situation and the wasted space is probably insignificant. And if space really is a problem, you could always create a new struct that contains the original struct plus some padding and use that in the array. Problem solved.
The last "gotcha" with aligned dynamically-allocated arrays is that sometimes aligning the memory in operator new might not work at all. Yes, you heard that right. For our approach to work, the C runtime must place the array at the beginning of the block we took so many pains to align correctly.
Unfortunately, the C++ standard allows the runtime to pass a larger size parameter to operator new and rearrange things a bit. We're back in the realm of platform-dependent behavior, but fortunately it's something that only seems to happen when creating arrays of objects with virtual functions.
It's probably a bad idea to try to align data when vtables are involved anyway, so stick to simple structures and everything will be fine.
Stacking Things Up
The last allocation type we need to deal with is the stack. Even though the stack is one of the easiest memory allocation types to understand and work with, it's the one where ensuring alignment can be the hardest.
Since we have no control over where exactly in the stack our variables are allocated, you would think that the C runtime could help with alignment issues. The good news is that some recent versions of compilers do (like Visual Studio 2005 and later versions) by honoring the same alignment attributes we saw in the static allocation section.
The bad news is that a lot of compilers won't respect those rules (like Visual Studio 2003). The even worse news is that they not only won't follow the alignment rules, but also will remain totally quiet and not print a single warning.
But wait! Things get even more complicated. In addition to local variables, the stack may also hold some of the parameters passed to functions and the value returned from functions as well (unless they can be optimized by using registers instead).
How the registers are used, and how exactly the stack is laid out is called the application binary interface (ABI) and is described in a standard for each platform. Using the ABI, different programs and libraries can interoperate correctly and call into each other's functions.
Some platforms, like Windows x64 and PowerPCs, allow for stack parameters to be aligned correctly, but the ABI used in Win32 does not support parameter alignment. The only sure way to know how something will be aligned is to wade through the arcane documents describing the ABI for your specific platform.
If you can't rely on your target platforms to correctly align data and parameters on the stack, you can avoid the stack completely when you need to use aligned data and dynamically allocate data on the heap instead.
It seems like a step backward to trade stack allocation for heap allocation, since heap allocation is much more complex, potentially expensive, and can lead to many problems of leaks and memory fragmentation.
A good compromise is to use a heap-based, fixed-size memory pool that implements correct alignment but behaves like a stack-data gets allocated and removed from the top.
That way you get the benefits of a stack in fast performance, no fragmentation, and fixed bounds, but you can also enforce the correct alignment on all the variables.
To be on the safe side, you might want to prevent objects that require a particular alignment from being created on the stack at all.
A simple way to accomplish that is to declare a private destructor and require programmers to call a Destroy method instead of using the destructor directly. If anybody attempts to create one of those objects on the stack, the compiler will catch it right away.
Another solution is to assert in the constructor that the following pointer is aligned correctly: assert(IsAligned(this, 16));. But it's not an ideal solution because it's a runtime check instead of a compile-time check.
To make matters worse, it might not be triggered for a while because the object was accidentally allocated on the correct boundary, but it's much better than getting memory corruptions or exceptions when trying to use the misaligned data.
With these techniques in hand, you should be able to control where your data is placed. Maybe it will instill in you a healthy fear for the intricate details of C++, but it will also allow you to take full advantage of that hardware with those alignment restrictions, and hopefully bump your game into a solid 60fps.
The author thanks Jim Tilander for his feedback on some of the technical details.