Featured Blog

Custom Vector Allocation

In my posts about 'rolling your own' vector class I talked about the general approach and some specific performance tweaks, but so far haven't talked about custom memory allocation. I'll come back now and look at the approach we took for this side things.

(Number 6 in a series of posts about Vectors and Vector based containers.)

A few posts back I talked about the idea of 'rolling your own' STL-style vector class, based my experiences with this at PathEngine.

In that original post and these two follow-ups I talked about the general approach and also some specific performance tweaks that actually helped in practice for our vector use cases.

I haven't talked about custom memory allocation yet, however. This is something that's been cited in a number of places as a key reason for switching away from std::vector so I'll come back now and look at the approach we took for this (which is pretty simple, but nonstandard, and also pre C++11), and assess some of the implications of using this kind of non-standard approach.

I approach this from the point of view of a custom vector implementation, but I'll be talking about some issues with memory customisation that also apply more generally.

Why custom allocation?

In many situations it's fine for vectors (and other containers) to just use the same default memory allocation method as the rest of your code, and this is definitely the simplest approach.

(The example vector code I posted previously used malloc() and free(), but works equally well with global operator new and delete.)

But vectors can do a lot of memory allocation, and memory allocation can be expensive, and it's not uncommon for memory allocation operations to turn up in profiling as the most significant cost of vector based code. Custom memory allocation approaches can help resolve this.

And some other good reasons for hooking into and customising allocations can be the need to avoid memory fragmentation or to track memory statistics.

For these reasons generalised memory customisation is an important customer requirement for our SDK code in general, and then by extension for the vector containers used by this code.

Custom allocation in std::vector

The STL provides a mechanism for hooking into the container allocation calls (such as vector buffer allocations) through allocators, with vector constructors accepting an allocator argument for this purpose.

I won't attempt a general introduction to STL allocators, but there's a load of material about this on the web. See, for example, this article on Dr Dobbs, which includes some example use cases for allocators. (Bear in mind that this is pre C++11, however. I didn't see any similarly targeted overview posts for using allocators post C++11.)

A non-standard approach

We actually added the possibility to customise memory allocation in our vectors some time after switching to a custom vector implementation. (This was around mid-2012. Before that PathEngine's memory customisation hooks worked by overriding global new and delete, and required dll linkage if you wanted to manage PathEngine memory allocations separately from allocations in the main game code.)

We've generally tried to keep our custom vector as similar as possible to std::vector, in order to avoid issues with unexpected behaviour (since a lot of people know how std::vector works), and to ensure that code can be easily switched between std::vector and our custom vector. When it came to memory allocation, however, we chose a significantly different (and definitely non-standard) approach, because in practice a lot of vector code doesn't actually use allocators (or else just sets allocators in a constructor), because we already had a custom vector class in place, and because I just don't like STL allocators!

Other game developers

A lot of other game developers have a similar opinion of STL allocators, and for many this is actually then also a key factor in a decision to switch to custom container classes.

For example, issues with the design of STL allocators are quoted as one of the main reasons for the creation of the EASTL, a set of STL replacement classes, by Electronic Arts. From the EASTL paper:

Among game developers the most fundamental weakness is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL.

And I've heard similar things from other developers. For example, in this blog post about the Bitsquid approach to allocators Niklas Frykholm says:

If it weren't for the allocator interface I could almost use STL. Almost.

Let's have a look at some of the reasons for this distaste!

Problems with STL allocators

We'll look at the situation prior to C++11, first of all, and the historical basis for switching to an alternative mechanism.

A lot of problems with STL allocators come out of confusion in the initial design. According to Alexander Stepanov (primary designer and implementer of the STL) the custom allocator mechanism was invented to deal with a specific issue with Intel memory architecture. (Do you remember near and far pointers? If not, consider yourself lucky I guess!) From this interview with Alexander:

Question: How did allocators come into STL? What do you think of them?

Answer: I invented allocators to deal with Intel's memory architecture. They are not such a bad ideas in theory - having a layer that encapsulates all memory stuff: pointers, references, ptrdiff_t, size_t. Unfortunately they cannot work in practice.

And it seems like this original design intention was also only partially executed. From the wikipedia entry for allocators:

They were originally intended as a means to make the library more flexible and independent of the underlying memory model, allowing programmers to utilize custom pointer and reference types with the library. However, in the process of adopting STL into the C++ standard, the C++ standardization committee realized that a complete abstraction of the memory model would incur unacceptable performance penalties. To remedy this, the requirements of allocators were made more restrictive. As a result, the level of customization provided by allocators is more limited than was originally envisioned by Stepanov.

and, further down:

While Stepanov had originally intended allocators to completely encapsulate the memory model, the standards committee realized that this approach would lead to unacceptable efficiency degradations. To remedy this, additional wording was added to the allocator requirements. In particular, container implementations may assume that the allocator's type definitions for pointers and related integral types are equivalent to those provided by the default allocator, and that all instances of a given allocator type always compare equal, effectively contradicting the original design goals for allocators and limiting the usefulness of allocators that carry state.

Some of the key problems with STL allocators (historically) are then:

  • Unnecessary complexity, with some boiler plate stuff required for features that are not actually used
  • A limitation that allocators cannot have internal state ('all instances of a given allocator type are required to be interchangeable and always compare equal to each other')
  • The fact the allocator type is included in container type (with changes to allocator type changing the type of the container)

There are some changes to this situation with C++11, as we'll see below, but this certainly helps explain why a lot of people have chosen to avoid the STL allocator mechanism, historically!

Virtual allocator interface

So we decided to avoid STL allocators, and use a non-standard approach.

The approach we use is based on a virtual allocator interface, and avoids the need to specify allocator type as a template parameter.

This is quite similar to the setup for allocators in the BitSquid engine, as described by Niklas here (as linked above, it's probably worth reading that post if you didn't see this already, as I'll try to avoid repeating the various points he discussed there).

A basic allocator interface can then be defined as follows:

class iAllocator
    virtual ~iAllocator() {}
    virtual void* allocate(tUnsigned32 size) = 0;
    virtual void deallocate(void* ptr) = 0;
// helper
    template <class T> void
    allocate_Array(tUnsigned32 arraySize, T*& result)
        result = static_cast<T*>(allocate(sizeof(T) * arraySize));

The allocate_Array() method is for convenience, concrete allocator objects just need to implement allocate() and free().

We can store a pointer to iAllocator in our vector, and replace the direct calls to malloc() and free() with virtual function calls, as follows:

    static T*
    allocate(size_type size)
        T* allocated;
        _allocator->allocate_Array(size, allocated);
        return allocated;
    reallocate(size_type newCapacity)
        T* newData;
        _allocator->allocate_Array(newCapacity, newData);
        copyRange(_data, _data + _size, newData);
        deleteRange(_data, _data + _size);
        _data = newData;
        _capacity = newCapacity;

These virtual function calls potentially add some overhead to allocation and deallocation. It's worth being quite careful about this kind of virtual function call overhead, but in practice it seems that the overhead is not significant here. Virtual function call overhead is often all about cache misses and, perhaps because there are often just a small number of actual allocator instance active, with allocations tending to be grouped by allocator, this just isn't such an issue here.

We use a simple raw pointer for the allocator reference. Maybe a smart pointer type could be used (for better modern C++ style and to increase safety), but we usually want to control allocator lifetime quite explicitly, so we're basically just careful about this.

Allocators can be passed in to each vector constructor, or if omitted will default to a 'global allocator' (which adds a bit of extra linkage to our vector header):

    cVector(size_type size, const T& fillWith,
        iAllocator& allocator = GlobalAllocator()
        _data = 0;
        _allocator = &allocator;
        _size = size;
        _capacity = size;
            _allocator->allocate_Array(_capacity, _data);
            constructRange(_data, _data + size, fillWith);

Here's an example concrete allocator implementation:

class cMallocAllocator : public iAllocator
    allocate(tUnsigned32 size)
        return malloc(static_cast<size_t>(size));
    deallocate(void* ptr)

(Note that you normally can call malloc() with zero size, but this is something that we disallow for PathEngine allocators.)

And this can be passed in to vector construction as follows:

    cMallocAllocator allocator;
    cVector<int> v(10, 0, allocator);

Swapping vectors

That's pretty much it, but there's one tricky case to look out for.

Specifically, what should happen in our vector swap() method? Let's take a small diversion to see why there might be a problem.

Consider some code that takes a non-const reference to vector, and 'swaps a vector out' as a way of returning a set of values in the vector without the need to heap allocate the vector object itself:

class cVectorBuilder
    cVector<int> _v;
    //.... construction and other building methods
    void takeResult(cVector<int>& result); // swaps _v into result

So this code doesn't care about allocators, and just wants to work with a vector of a given type. And maybe there is some other code that uses this, as follows:

void BuildData(/*some input params*/, cVector& result)
  //.... construct a cVectorBuilder and call a bunch of build methods

Now there's no indication that there's going to be a swap() involved, but the result vector will end up using the global allocator, and this can potentially cause some surprises in the calling code:

   cVector v(someSpecialAllocator);
   BuildData(/*input params*/, v);
   // lost our allocator assignment!
   // v now uses the global allocator

Nobody's really doing anything wrong here (although this isn't really the modern C++ way to do things). This is really a fundamental problem arising from the possibility to swap vectors with different allocators, and there are other situations where this can come up.

You can find some discussion about the possibilities for implementing vector swap with 'unequal allocators' here. We basically choose option 1, which is to simply declare it illegal to call swap with vectors with different allocators. So we just add an assert in our vector swap method that the two allocator pointers are equal.

In our case this works out fine, since this doesn't happen so much in practice, because cases where this does happen are caught directly by the assertion, and because it's generally straightforward to modify the relevant code paths to resolve the issue.

Comparison with std::vector, is this necessary/better??

Ok, so I've outlined the approach we take for custom allocation in our vector class.

This all works out quite nicely for us. It's straightforward to implement and to use, and consistent with the custom allocators we use more generally in PathEngine. And we already had our custom vector in place when we came to implement this, so this wasn't part of the decision about whether or not to switch to a custom vector implementation. But it's interesting, nevertheless, to compare this approach with the standard allocator mechanism provided by std::vector.

My original 'roll-your-own vector' blog post was quite controversial. There were a lot of responses strongly against the idea of implementing a custom vector, but a lot of other responses (often from the game development industry side) saying something like 'yes, we do that, but we do some detail differently', and I know that this kind of customisation is not uncommon in the industry.

These two different viewpoints makes it worthwhile to explore this question in a bit more detail, then, I think.

I already discussed the potential pitfalls of switching to a custom vector implementation in the original 'roll-your-own vector' blog post, so lets look at the potential benefits of switching to a custom allocator mechanism.

Broadly speaking, this comes down to three key points:

  • Interface complexity
  • Stateful allocator support
  • Possibilities for further customisation and memory optimisation

Interface complexity

If we look at an example allocator implementation for each setup we can see that there's a significant difference in the amount of code required. The following code is taken from my previous post, and was used to fill allocated memory with non zero values, to check for zero initialisation:

// STL allocator version
template <class T>
class cNonZeroedAllocator
    typedef T value_type;
    typedef value_type* pointer;
    typedef const value_type* const_pointer;

Latest Jobs

Sucker Punch Productions

Bellevue, Washington
Combat Designer

Xbox Graphics

Redmond, Washington
Senior Software Engineer: GPU Compilers

Insomniac Games

Burbank, California
Systems Designer

Deep Silver Volition

Champaign, Illinois
Senior Environment Artist
More Jobs   


Register for a
Subscribe to
Follow us

Game Developer Account

Game Developer Newsletter


Register for a

Game Developer Account

Gain full access to resources (events, white paper, webinars, reports, etc)
Single sign-on to all Informa products

Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more