Lightweight variable-size memory block allocator
This post presents an implementation of the lightweight variable-size memory block allocator that is used in Diligent Engine 2.0.
This post describes a lightweight class that handles free memory block management to accommodate variable-size allocation requests. It is used by Diligent Engine to handle descriptor heap allocations, which will be covered in subsequent posts. The class keeps track of free blocks of memory, and facilitates allocation by finding the smallest block that provides enough space. Deallocated chunks of memory are merged with the existing free blocks. Runtime complexity is logarithmic with respect to the number of blocks for both allocation and deallocation.
The class keeps track of free blocks only and does not record allocation sizes. It uses two ordered maps to facilitate operations. The first map keeps blocks sorted by their offsets. The second multimap keeps blocks sorted by their sizes. The elements of the two maps reference each other, which enables efficient block insertion, removal and merging. Note that since we need ordered access, all operations with the maps (search/insert/delete) take logarithmic time on the number of blocks. The two maps are defined as shown below:
typedef size_t OffsetType; struct FreeBlockInfo; // Type of the map that keeps memory blocks sorted by their offsets using TFreeBlocksByOffsetMap = std::map<OffsetType, FreeBlockInfo>; // Type of the map that keeps memory blocks sorted by their sizes using TFreeBlocksBySizeMap = std::multimap<OffsetType, TFreeBlocksByOffsetMap::iterator>; struct FreeBlockInfo { // Block size (no reserved space for the size of allocation) OffsetType Size; // Iterator referencing this block in the multimap sorted by the block size TFreeBlocksBySizeMap::iterator OrderBySizeIt; FreeBlockInfo(OffsetType _Size) : Size(_Size){} }; TFreeBlocksByOffsetMap m_FreeBlocksByOffset; TFreeBlocksBySizeMap m_FreeBlocksBySize;
Note that the second map is a multimap because there may be several blocks of the same size. To the contrary, there could be only single block at the specific offset. An example for the case of four free blocks is given in the figure below:
Note that the Size member of the FreeBlockInfo structure is redundant as it is equal to OrderBySizeIt->first. It however makes implementation more readable and clear.
Allocation
To allocate a new block, we use the second map to find the smallest block that is large enough to encompass the requested size. multimap::lower_bound standard function does exactly what we need:
auto SmallestBlockItIt = m_FreeBlocksBySize.lower_bound(Size); if(SmallestBlockItIt == m_FreeBlocksBySize.end()) return InvalidOffset; auto SmallestBlockIt = SmallestBlockItIt->second;
This operation takes logarithmic time on the size of the container. Offset to the beginning of the requested allocation will then be given by the beginning of this block:
auto Offset = SmallestBlockIt->first;
The offset of the new free block will be given by adding requested size to the original offset, and the new size will be given by subtracting the requested memory size from the original block size:
auto NewOffset = Offset + Size; auto NewSize = SmallestBlockIt->second.Size - Size;
The final step is to update the maps. Since both block size and offset have been changed, we have to first remove original elements from the containers, and if the new block is not empty, insert new elements into the maps:
m_FreeBlocksBySize.erase(SmallestBlockItIt); m_FreeBlocksByOffset.erase(SmallestBlockIt); if (NewSize > 0) AddNewBlock(NewOffset, NewSize);
AddNewBlock() is a helper function that inserts a new block of given size at the specified offset:
void AddNewBlock(OffsetType Offset, OffsetType Size) { auto NewBlockIt = m_FreeBlocksByOffset.emplace(Offset, Size); auto OrderIt = m_FreeBlocksBySize.emplace(Size, NewBlockIt.first); NewBlockIt.first->second.OrderBySizeIt = OrderIt; }
Both insertion and deletion of an element into an ordered map take logarithmic time, so the total method complexity is O(log n).
As an example, if we request the block of size 20 bytes, the system will return offset 32 and the structure of the blocks after the allocation will look like shown in the figure below:
The full function source code is given in the following listing:
OffsetType Allocate(OffsetType Size) { if(m_FreeSize < Size) return InvalidOffset; // Get the first block that is large enough to encompass Size bytes auto SmallestBlockItIt = m_FreeBlocksBySize.lower_bound(Size); if(SmallestBlockItIt == m_FreeBlocksBySize.end()) return InvalidOffset; auto SmallestBlockIt = SmallestBlockItIt->second; auto Offset = SmallestBlockIt->first; auto NewOffset = Offset + Size; auto NewSize = SmallestBlockIt->second.Size - Size; m_FreeBlocksBySize.erase(SmallestBlockItIt); m_FreeBlocksByOffset.erase(SmallestBlockIt); if (NewSize > 0) AddNewBlock(NewOffset, NewSize); m_FreeSize -= Size; return Offset; }
Deallocation
Deallocation is little more involved as there are more cases we need to handle. The first step is to understand where the new block should be inserted. For that purpose, we can use the first map that keeps memory blocks sorted by the offsets. Since we know the offset of the block to be deallocated, we can use map::upper_bound to find a block that goes right after it. Using the obtained interator, we can then find preceding free block:
// Find the first element whose offset is greater than the specified offset auto NextBlockIt = m_FreeBlocksByOffset.upper_bound(Offset); auto PrevBlockIt = NextBlockIt; if(PrevBlockIt != m_FreeBlocksByOffset.begin()) --PrevBlockIt; else PrevBlockIt = m_FreeBlocksByOffset.end();
Next we need to handle four possible cases:
The block being released is not adjacent to any other free block. In this case we simply add the block as a new free block.
The block is adjacent to the previous block. The two blocks need to be merged.
The block is adjacent to the next block. The two blocks need to be merged.
The block is adjacent to both previous and next blocks. All three blocks need to be merged.
The full function listing is given below:
void Free(OffsetType Offset, OffsetType Size) { // Find the first element whose offset is greater than the specified offset auto NextBlockIt = m_FreeBlocksByOffset.upper_bound(Offset); auto PrevBlockIt = NextBlockIt; if(PrevBlockIt != m_FreeBlocksByOffset.begin()) --PrevBlockIt; else PrevBlockIt = m_FreeBlocksByOffset.end(); OffsetType NewSize, NewOffset; if(PrevBlockIt != m_FreeBlocksByOffset.end() && Offset == PrevBlockIt->first + PrevBlockIt->second.Size) { // PrevBlock.Offset Offset // | | // |<-----PrevBlock.Size----->|<------Size-------->| // NewSize = PrevBlockIt->second.Size + Size; NewOffset = PrevBlockIt->first; if (NextBlockIt != m_FreeBlocksByOffset.end() && Offset + Size == NextBlockIt->first) { // PrevBlock.Offset Offset NextBlock.Offset // | | | // |<-----PrevBlock.Size----->|<------Size-------->|<-----NextBlock.Size----->| // NewSize += NextBlockIt->second.Size; m_FreeBlocksBySize.erase(PrevBlockIt->second.OrderBySizeIt); m_FreeBlocksBySize.erase(NextBlockIt->second.OrderBySizeIt); // Delete the range of two blocks ++NextBlockIt; m_FreeBlocks.erase(PrevBlockIt, NextBlockIt); } else { // PrevBlock.Offset Offset NextBlock.Offset // | | | // |<-----PrevBlock.Size----->|<------Size-------->| ~ ~ ~ |<-----NextBlock.Size----->| // m_FreeBlocksBySize.erase(PrevBlockIt->second.OrderBySizeIt); m_FreeBlocksByOffset.erase(PrevBlockIt); } } else if (NextBlockIt != m_FreeBlocksByOffset.end() && Offset + Size == NextBlockIt->first) { // PrevBlock.Offset Offset NextBlock.Offset // | | | // |<-----PrevBlock.Size----->| ~ ~ ~ |<------Size-------->|<-----NextBlock.Size----->| // NewSize = Size + NextBlockIt->second.Size; NewOffset = Offset; m_FreeBlocksBySize.erase(NextBlockIt->second.OrderBySizeIt); m_FreeBlocksByOffset.erase(NextBlockIt); } else { // PrevBlock.Offset Offset NextBlock.Offset // | | | // |<-----PrevBlock.Size----->| ~ ~ ~ |<------Size-------->| ~ ~ ~ |<-----NextBlock.Size----->| // NewSize = Size; NewOffset = Offset; } AddNewBlock(NewOffset, NewSize); m_FreeSize += Size; }
The run-time complexity of deallocation routine is logarithmic with respect to the number of free blocks. The figure below shows an example of deallocating a block of size 8 at offset 56, that results in a merge of the block with the previous and next free blocks:
Variable Size GPU Allocations Manager
When managing GPU memory, deallocations can only be performed when the memory is no longer accessed by the GPU. A simple extension of the basic allocator allows deferring block deallocation until the time when it is safe to do so. The class provides new Free() method that takes frame number along with the block offset and size. The method adds the block attributes in the deletion queue, and releases all the stale allocations at once at the end of each frame:
class VariableSizeGPUAllocationsManager : public VariableSizeAllocationsManager { private: struct FreedAllocationInfo { OffsetType Offset; OffsetType Size; Uint64 FrameNumber; }; public: VariableSizeGPUAllocationsManager(OffsetType MaxSize, IMemoryAllocator &Allocator); ~VariableSizeGPUAllocationsManager(); VariableSizeGPUAllocationsManager(VariableSizeGPUAllocationsManager&& rhs); VariableSizeGPUAllocationsManager& operator = (VariableSizeGPUAllocationsManager&& rhs) = default; VariableSizeGPUAllocationsManager(const VariableSizeGPUAllocationsManager&) = delete; VariableSizeGPUAllocationsManager& operator = (const VariableSizeGPUAllocationsManager&) = delete; void Free(OffsetType Offset, OffsetType Size, Uint64 FrameNumber) { // Do not release the block immediately, but add // it to the queue instead m_StaleAllocations.emplace_back(Offset, Size, FrameNumber); } void ReleaseCompletedFrames(Uint64 NumCompletedFrames) { // Free all allocations from the beginning of the queue that belong to completed frames while(!m_StaleAllocations.empty() && m_StaleAllocations.front().FrameNumber < NumCompletedFrames) { auto &OldestAllocation = m_StaleAllocations.front(); VariableSizeAllocationsManager::Free(OldestAllocation.Offset, OldestAllocation.Size); m_StaleAllocations.pop_front(); } } private: std::deque< FreedAllocationInfo > m_StaleAllocations; };
The allocator provides a lot of flexibility as it can handle allocations of any (potentially very large) size. Its run-time complexity is logarithmic with respect to the number of free blocks as opposed to constant-time complexity for fixed-size block allocators. However, most of the allocations throught the class happen at the intialization time, so it is not used in hot paths of the engine, and the performance is acceptable.
Visit Diligent Engine on the Web, follow us on Twitter, and Facebook.
Read more about:
BlogsAbout the Author
You May Also Like