Each new console generation has significantly more memory than the last. It’s tempting to think that in the next generation memory won’t matter. Whilst it’s true that memory doesn’t matter as much as it did in the 8bit days, I’ve yet to work on a game that didn’t have memory problems at some point during development.
The level of memory management I will discuss in this blog is at the new/delete level. By default this is handled by the standard malloc/free, but this can be overridden with a custom malloc replacement, such as dlmalloc, Hoard allocator, jemalloc or my own VMem allocator. Whilst all of these allocators have different performance characteristics, they all have some things in common, and it’s useful to have a working knowledge of what they are doing under the hood. Whatever you choose in your game, whether you stay with malloc/free or use a malloc replacement, if you are programming in C++ there are certain things that you should know about memory management.
A memory manager usually consists of at least 3 types of heaps:
- Fixed Size Heap – used for small allocations
- Coalesce Heap – used for larger allocations
- Large Heap – usually handled directly by the system
Figure 1: A conceptual view of a simple memory manager
Fixed Size Heap
The fixed size heap consists of a number of fixed size allocators (or FSAs for short). They are called fixed size because each allocator only handles allocations of a specific size. So, for example, there may be separate FSAs for each allocation size from 0 – 256 bytes. When you ask for an allocation the memory manager will check if it is in this size range; if it is, it will select the relevant FSA and use that to allocate the memory.
The number of FSAs in a memory manager varies dramatically. Some memory managers only have FSAs from 0 – 128, others from 0 – 512, and some don’t have them at all. There are diminishing returns of going over 512 simply because there are usually not that many allocations of each size and the FSAs become wasteful of memory. However, if you know that you are making many allocations all of the same size and you suspect they are not going through an FSA, it is probably a good idea to route them through your own allocator.
Figure 2: FSA sizes. Note that the spread might not be even (and on 64bit systems each FSA size will be a multiple of 16 to ensure alignment).
The benefits of FSA heaps are twofold. They are fast and they eliminate spatial fragmentation. I will discuss fragmentation in more detail later, so for now let’s focus on the speed. The way an FSA works is that it allocates memory from the system in large blocks at a time, for example one 4K system page. It then splits that block up into equal sized slots. These slots are linked into a free list. When this allocator is asked to allocate memory, all it has to do is to take one of those slots off a free list. When freeing it simply pushes the allocation back onto the free list.
Figure 3: Example of an FSA page
You may be thinking this is oversimplified, and you’d be right. Finding out where an allocation came from is a non-trivial problem. It’s too slow to ask each allocator if it allocated the allocation, so memory managers employ various techniques to get around this, such as storing information in each allocation, or comparing addresses against reserved page ranges. Freeing memory is often much slower than allocating memory for this reason; one of the differentiating factors between allocators is how they solve this problem. This is too detailed to go into here, but the take-away fact is that freeing memory is often slower than allocating it.
FSAs work best when there are many high frequency, short-lived allocations. In a large game that makes use of stl it isn’t uncommon to have around 40,000 small allocations per second. FSAs also work fine for longer-lived allocations as long as they are allocated before anything else. Mixing short- and long-lived allocations will lead to temporal fragmentation. FSAs are essential because they are very fast and eliminate spatial fragmentation.
Coalesce heaps consist of one of more coalesce allocators. The coalesce heap will handle all allocation sizes that are not handled by the FSA heap. A coalesce allocator is very different from an FSA. For a start, a single allocator can return allocations of different sizes. A coalesce allocator will first request a large region of memory from the system, say 32MB. The allocator keeps track of the allocated and free ranges of memory in this region. When an allocation is requested, it searches for the best free range to use. If it finds a free range that fits the allocation exactly, it marks the entire range as allocated. If not, it splits the free range into an allocated range and free range. A big differentiating factor between memory managers is how they decide on the best free range to use and this has a big impact on how well they limit fragmentation.
The coalesce allocator gets its name from what it does when it frees allocations. When an allocation is freed it marks the range as free. If the next consecutive range is already free, then these two ranges are merged, or coalesced, into one large range. Similarly, if the previous range is free it coalesces with that range. This coalescing is important because it prevents the free space from fragmenting, and means that it is less likely that large allocations will fail when there is memory pressure. This coalescing can happen immediately or can be delayed, but this is an implementation detail; there are benefits and drawbacks to both.
Figure 4: Example of a coalesce range and what happens when a range is freed
Some allocators will use coalesce heaps for large allocations, whilst others simply fall back to the system allocation functions (e.g. VirtualAlloc/HeapAlloc). Because there are usually fewer large allocations, and because they will not fragment memory, large allocations are not a concern.
Memory fragmentation is an important consideration for all games, especially those on a platform with fixed memory. Fragmentation is basically the measure of unused memory that a memory manager has committed, that for various reasons probably won’t be re-used. Over time, the memory used becomes increasingly spread out, and the same amount of allocated memory takes up more and more system memory because of all the wastage. You may find that even though you have enough free memory to make an allocation, there isn’t a single free block of memory that is big enough to hold that allocation, and so the allocation fails. If you are not aware of fragmentation, the first you might know of it is when your game fails a long soak test a few weeks before shipping; certainly not the best time to find this out. It is generally accepted that most memory managers will plateau at 10% loss to fragmentation, but I’ve seen plenty of instances where fragmentation was significantly worse than this.
There are two types of fragmentation: spatial and temporal.
Spatial fragmentation happens in coalesce heaps. If the free space that the coalesce heap decided to use for an allocation isn’t exactly the right size, there will be an offcut. If this offcut is small enough it will be referred to as a fragment, and the important thing to realise is that unless the allocation is freed and the fragment is coalesced, the fragment won’t be re-used.
Figure 5: Spatial fragmentation
Imagine many thousands of allocs and frees and you can easily see how lots of allocation sizes might not quite fit and you are left with many small fragments of memory. Because allocations are usually freed in a random pattern, this fragmentation usually increases over time. In theory these fragments could be re-used because the allocation that caused the fragment could be freed, the fragment would be coalesced back into the free range and an allocation of the perfect size could come along and use the entire range. However, as we know, entropy is a fundamental rule of reality and this rarely happens.
It’s tempting to say that we should fill in these fragments using the small allocations instead of using FSAs, but consider how memory would look if we did that, and then freed the larger allocations. Memory would become peppered with small allocations which would prevent us from making larger allocations. Coalesce heaps fragment easily, and combining small and large allocations is a bad idea.
It’s easy to see how FSAs eliminate spatial fragmentation. Because each allocator only deals with allocations of one size, no fragments are ever created. You might be wondering why we don’t just use FSAs for everything; as mentioned earlier there are diminishing returns as the allocation sizes become larger, because there are fewer types of allocation of each size.
Temporal fragmentation affects both FSAs and coalesce allocators. It is caused by the mixing of short-lived allocations with long-lived allocations. A simple example illustrates this: imagine a slew of small short-lived allocations, allocated by one of the FSAs. The FSA allocates a new page from the system, chops it up and returns the allocation slots. However, in the middle of these allocations is one allocation that is long-lived. When all of the short-lived allocations are freed, the long-lived allocation remains, meaning that the page can’t be released back to the system. It will be left with a mostly empty page of memory that might not be re-used.
Figure 6: Temporal fragmentation
Memory managers can’t do much about temporal fragmentation because they have no knowledge of how long each allocation will live for. This means it’s down to us to limit this in the best we can. The biggest culprit of temporal fragmentation is strings. Strings are usually small enough to go into the FSA heap, they are often long-lived and they are often created at random times throughout the game’s lifetime. For example, a level load will often result in a spike of memory. If a long-lived string happens to be allocated at the peak, you may find that the memory doesn’t return to its previous level. The easiest way around this is to route all string allocations through their own allocator.
Armed with this knowledge about the internals of a memory manager, we can come up with some guidelines on best practices.
- Different sized allocations go through different heaps and so have different considerations. Small allocations go through the quick low fragmentation FSA heaps, larger allocations go through slower coalesce heaps. Small allocations are assumed to be high frequency and short-lived. Larger allocations are assumed to be low frequency and long- lived.
- Small allocations are very fast and there is safety in numbers. Some programmers are so scared of allocating any memory they will use custom allocators for everything, not realising that a small allocation is simply popping a value off a free list. Making all small allocations go through a single general purpose allocator is usually more efficient than having lots of fixed heaps. Only use your own allocator if you are doing lots of allocations of the same size and lifetime, such as in a particle system.
- Avoid large short-lived allocations. The coalesce heap will fragment very quickly with short- lived allocations. Use a custom allocator or scratch area of memory for these.
- Freeing is much slower than allocating. Keep this in mind when deciding when to free memory.
- Try not to mix long- and short-lived allocations. This is often easier said than done, but a simple solution is to use a custom allocator for small allocations that you know are long-lived or never get freed. Similarly, use a separate allocator for larger, more frequent allocations that you know are not going to stick around.
- Put a custom allocator behind your string class. Strings are almost always the biggest cause of all types of fragmentation.
- Coalesce heaps suffer from both spatial and temporal fragmentation. Coalesce heaps are used for larger allocations, and it is expected that the allocation frequency will be lower. Coalesce heaps can fragment very badly. Allocating from coalesce heaps continuously every frame is probably a bad idea. Coalesce heaps are great for packing together varying sizes of allocations, but are poor when allocation frequency is high. Ideally, most large allocations would happen during level load, and would not be touched again while the game is running.
It is better to do certain things up front, such as putting custom allocators behind your string class and particle systems. There are some things that should always be avoided, such as high frequency/short-lived large allocations. Other than that, it is best to periodically profile your game to see what needs attention. A good profiler will be able to tell you where your memory is going, the time spent in the allocators, allocator overhead, and loss due to fragmentation.
Optimising memory usage is a difficult task. It’s good to have general tips in mind whilst programming, but it’s also useful to have a better understanding of memory managers for profiling and fixing memory related problems. I hope this blog has shed some light on that black box of memory allocation.