C++ Adventures / Memory Arena

Roughly half a year ago¹, I decided that I would write a library to help me parse and generate XML and other markup languages. Documents in each of these languages can be represented as a forest² of various kinds of nodes. The basic implementation is fairly simple, but can lead to adversarial allocation patterns for the default runtime allocator (e.g. the libc malloc()). These patterns can cause runaway memory consumption over time due to fragmentation. There is a host of other issues associated with this, but for my purposes, fragmentation is the worst one.

In most common cases, the lifetime of the entire document tree is tied to the document root itself. This is true both for the case where the document is being built or modified programmatically, and when it's built by the parser during deserialization. In other cases, such as when querying, the document can be treated as read-only until the operation is done.

Arenas³ fit this use case very well. Generally, an arena is a broad concept, this post will restrict itself to the following meaning. Assuming a large enough buffer, an object can be allocated simply by bumping the buffer pointer by the size of the object, and returning the original pointer to the caller. Once the objects in the arena are no longer needed, it can be reset, and the underlying buffer can be deallocated. Resetting the arena technically means just moving the buffer pointer back to the start of the buffer. However, semantically, it ends the lifetimes of all objects that have been allocated in the arena up to this point. This will be important later.

The above description is really all there is to an arena. However, the actual implementation gets quite complicated. The rest of this post is dedicated to showcasing an arena implementation I wrote for my markup processing library.

How to C

Arenas in the sense of this post are very simple conceptually. To make it clearer, the following is a naive implementation in C.

typedef struct Arena {
    uint8_t *BufferStart;
    uint8_t *BufferCurrent;
    size_t   BufferSize;
} Arena;

Arena ArenaCreate(size_t size)
{
    uint8_t *buffer = (uint8_t *)malloc(size);
    Arena arena = {
        .BufferStart = buffer,
        .BufferCurrent = buffer,
        .BufferSize = buffer == NULL ? 0 : size,
    };
    return arena;
}

void *ArenaAlloc(Arena *arena, size_t size)
{
    assert(arena->BufferCurrent >= arena->BufferStart);

    ptrdiff_t offset = arena->BufferCurrent - arena->BufferStart;
    ptrdiff_t remaining = (ptrdiff_t)arena->BufferSize - offset;
    if (remaining < (ptrdiff_t)size) {
        return NULL;
    }

    void *alloced = arena->BufferCurrent;
    arena->BufferCurrent += size;
    return alloced;
}

void ArenaReset(Arena *arena)
{
    arena->BufferCurrent = arena->BufferStart;
}

void ArenaDestroy(Arena *arena)
{
    free(arena->BufferStart);
    arena->BufferStart = NULL;
    arena->BufferCurrent = NULL;
    arena->BufferSize = 0;
}

Of course, what's outlined above could use some tweaks. For example, it's completely ignoring alignment. The Arena struct could be opaque, with the implementation details visible only to the implementation. The arena could also operate on a user provided buffer, instead of mallocing its own.

Another thing that is immediately obvious is that the lifetimes are handled only up to a point in this arena. In case the memory is used for objects with nontrivial cleanup, the user is fully responsible for cleaning them up before the arena is reset or destroyed.

How to C++

Of course, the above implementation is valid in C++ as well. However, it's not nearly as easy as in C, because the arena does not manage object lifetimes, it cares only about allocation. In practice this is nearly useless, because the user needs to do a lot of busywork manually. To be a bit more specific, this is what's generally necessary for every single object:

Allocate memory using ArenaAlloc (with appropriate padding for alignment).
Align the allocated pointer.
Call the constructor via placement new⁴.
Use the object (this is the only actually useful bit).
Call the destructor⁵.

And god forbid that the arena gets reset too early and more objects are allocated over the memory somehow. In that case, all hell breaks loose on that poor piece of memory, and everything that depends on it. Strict aliasing rules are broken, objects will not get cleaned up properly, and nothing will work right.

The silver lining is that C++ provides all the necessary tools to implement an arena correctly such that all the extra work is offloaded to it, and the user doesn't need to care about anything apart from not using pointers to reset memory.

Firstly, initialization can happen directly in the "allocation" function, similarly to how, e.g. std::make_unique, performs both the allocation, and initialization. By offloading the initialization to such Make function, and gaining information about the final type, it can also solve the alignment problem internally. Secondly, in case the arena contains any non-trivially destructible objects, it can keep track of all the destructors, and call them all in the correct order on Reset, or on destruction.

To continue, we should think though the API we want. Ideally, it should roughly correspond to the C example above, with the difference that the ArenaAlloc function should be replaced with a function such as Make to indicate that it does not perform only allocations. We should also prepare for handling destructors, since we need to take care of cleaning up after the allocated objects, as well as deallocating the memory they occupy. With all these points in mind, the basic Arena definition should look something like the following.

class Arena
{
public:
    Arena(std::uint8_t *buffer, std::size_t size);
    ~Arena();

    Arena(const Arena &) = delete;
    Arena &operator=(const Arena &) = delete;

    Arena(Arena &&other) noexcept;
    Arena &operator=(Arena &&other) noexcept;

    template <typename T, typename... Args>
    Handle<T> Make(Args &&...args);

    void Reset();

private:
    struct Destructor;

    template <typename T>
    void PushDestructor(void *memory, T *object);

    std::uint8_t *mBufferStart;
    std::uint8_t *mBufferCurrent;
    std::size_t   mBufferSize;
    Destructor   *mDestructorList;
};

While it should be fairly obvious from the lack of any synchronization primitives, keep in mind that this arena is not thread safe. Synchronization is necessary before calling it from multiple threads. It's also move only, copying the arena causes many issues, most notably in case there are multiple copies, where each calls Make independently, the copies would go out of sync, and violate strict aliasing.

Note that the Arena definition assumes the existence of a Handle<T> type that wraps around raw pointers⁶. This isn't a strict requirement, but it's very useful for implementing simple null checks, or more complex handle invalidation for debugging. Invalidation is fairly complex, and out of scope for this post.

Now, starting from the simplest bits, the following is the arena constructor. It doesn't need to do anything special apart from initializing all the data members properly. Of course, it can perform some simple sanity checks, but it can't do much apart from trusting the user that the buffer is valid.

Arena::Arena(std::uint8_t *buffer, std::size_t size)
    : mBufferStart(buffer)
    , mBufferCurrent(buffer)
    , mBufferSize(size)
    , mDestructorList(nullptr)
{}

The implementation is very similar to the ArenaCreate function in the C implementation, bar some syntactic differences, and that it's conceptually non-owning. It's similar to the point that it can change into the allocating and owning version very simply, like below. The following implementation must also have an accompanying conditional std::free call at the end of the destructor.

Arena::Arena(std::size_t size)
    : mBufferStart(static_cast<std::uint8_t *>(std::malloc(size)))
    , mBufferCurrent(mBufferStart)
    , mBufferSize(mBufferStart == nullptr ? 0 : size)
    , mDestructorList(nullptr)
{}

WARNING! If you decide to combine both these functions, you also need to keep track of whether the arena owns the buffer it's been initialized with, otherwise you will end up with either a memory leak or double free.

To finish up the basics, the move functions also need implementations. These operations are intuitive in the sense that they just transfer the state, and invalidate the original object to prevent misuse. One thing worth noting is that in case of assignment, the assignee must clean up its original state. This is handled by the mysterious Reset function, whose internals are explained later on.

Arena::Arena(Arena &&other) noexcept
    : mBufferStart(std::exchange(other.mBufferStart, nullptr))
    , mBufferCurrent(std::exchange(other.mBufferCurrent, nullptr))
    , mBufferSize(std::exchange(other.mBufferSize, 0))
    , mDestructorList(std::exchange(other.mDestructorList, nullptr))
{}

Arena &Arena::operator=(Arena &&other) noexcept
{
    Reset();

    mBufferStart = std::exchange(other.mBufferStart, nullptr);
    mBufferCurrent = std::exchange(other.mBufferCurrent, nullptr);
    mBufferSize = std::exchange(other.mBufferSize, 0);
    mDestructorList = std::exchange(other.mDestructorList, nullptr);

    return *this;
}

But What Is the Destructor, and Why Is the Pointer to It Called a List?

Great question! The arena has very severe self imposed restrictions on how it can do anything. Firstly, it cannot allocate on the free store, that would defeat the whole point of using an arena in the first place. Secondly, the destructor type itself must also be trivially destructible, and it should not rely on any dynamic polymorphism. It's also a bad idea to reserve space for destructors ahead of time. How should the arena then behave in case the available buffer is not big enough for all destructors? The buffer is also wasteful in cases where no explicit destructors are actually needed.

In the end, the most reasonable solution for this situation is to use a linked list. It consumes nothing in case there are no destructors, and it can grow as much as necessary while all list nodes live inside the arena itself. With all that in mind, this is what the Destructor should look like.

using DestructorPtrType = void(*)(void *);

struct Arena::Destructor
{
    Destructor       *Previous;
    void             *ObjectPtr;
    DestructorPtrType DestructorPtr;
};

"But what is the DestructorPtrType, and how does it help?", I hear you cry. It actually helps tremendously. The trick here is that it does not need any dynamic polymorphism if it's done right. This is where the PushDestructor function comes into place. Note that it assumes the void pointer to raw memory is already properly aligned, and the object itself is properly initialized.

template <typename T>
void Arena::PushDestructor(void *memory, T *object)
{
    mDestructorList = new (memory) Destructor {
        .Previous      = mDestructorList,
        .ObjectPtr     = object,
        .DestructorPtr = [] (void *object) {
            std::launder(reinterpret_cast<T *>(object))->~T();
        }
    };
}

Every time the above function template is instantiated, it also instantiates an anonymous free function that knows how to properly dispose of the object given its address. The destruction function must use std::launder to signal to the compiler that it's accessing memory of an existing object. However, it's unfortunately not all sunshine and rainbows, because every single explicit destructor costs 24B (on 64bit architectures)⁷, which is a steep cost, especially for small objects. This means destructors should be reserved only for cases where they are actually necessary.

Note the order of destruction. Due to how the destructor list is designed, the arena destroys objects in the reverse order of creation. This mirrors the destruction order in automatic storage duration objects, so the behavior is consistent and should be intuitive.

With the Destructor demystified, it's easy to implement the Reset member function, and the destructor for the whole arena. All it's doing is traversing the list by chasing the Previous pointers, calling the prepared destructor on each object, and resetting the current arena position at the end. Note that the std::free call should appear in the destructor after calling Reset in case the arena owns its memory as discussed above.

Arena::~Arena()
{
    Reset();
}

void Arena::Reset()
{
    Destructor *current = mDestructorList;
    while (current != nullptr) {
        current->DestructorPtr(current->ObjectPtr);
        current = current->Previous;
    }
    mDestructorList = nullptr;

    mBufferCurrent = mBufferStart;
}

Destruction Is Solved. What About Allocation and Initialization?

The process is written backwards. The cleanup procedure is now solved, but the allocation is still a bit of a mystery. Conceptually, the Make function has a simple job, but it needs to know how cleanup will work so it can prepare everything correctly. The most straight forward implementation needs to prepare properly aligned space both for the object, and its destructor. Afterwards, it must check that everything fits in the remaining arena buffer, bump the current buffer position, and finally initialize both of them before returning the handle to the new object.

template <typename T, typename... Args>
Handle<T> Arena::Make(Args &&...args)
{
    if (sizeof(T) > mBufferSize) {
        throw ArenaError("The object is larger than the arena.");
    }

    std::intptr_t begin = reinterpret_cast<std::intptr_t>(mBufferStart);
    std::ptrdiff_t offset = mBufferCurrent - mBufferStart;

    std::ptrdiff_t destructorOffset = offset;
    if (auto alignment = (begin + destructorOffset) % alignof(Destructor)) {
        destructorOffset += alignof(Destructor) - alignment;
    }

    std::ptrdiff_t objectOffset = destructorOffset + sizeof(Destructor);
    if (auto alignment = (begin + objectOffset) % alignof(T)) {
        objectOffset += alignof(T) - alignment;
    }

    if (objectOffset + sizeof(T) > mBufferSize) {
        throw ArenaError("Not enough space left in the arena.");
    }

    std::uint8_t *objectPtr = mBufferStart + objectOffset;
    T *ptr = new (objectPtr) T(std::forward<Args>(args)...);
    PushDestructor(mBufferStart + destructorOffset, ptr);
    mBufferCurrent = mBufferStart + objectOffset + sizeof(T);
    return ptr;
}

This implementation works correctly on the happy path, but it has several issues that need to be addressed.

It doesn't need to allocate a destructor for every single object, only those that are not trivially destructible.
It breaks aliasing rules in case T itself uses the arena during construction. This is technically a fairly weird scenario to begin with, but it can happen, and is actually desirable in some cases⁸.

Handling Trivially Destructible Types

First, let's deal with the primary source of unnecessary memory overhead. The arena spends three pointers' worth of memory for every single object with the above implementation, and it's unnecessarily slowing down cleanup, even though there is a whole class of objects that don't require cleanup on destruction. To fix this, the Make function just needs to check if the type is trivially destructible, in which case the destruction is a noop.

template <typename T, typename... Args>
Handle<T> Arena::Make(Args &&...args)
{
    if (sizeof(T) > mBufferSize) {
        throw ArenaError("The object is larger than the arena.");
    }

    std::intptr_t begin = reinterpret_cast<std::intptr_t>(mBufferStart);
    std::ptrdiff_t offset = mBufferCurrent - mBufferStart;

    std::ptrdiff_t destructorOffset;
    if constexpr (!std::is_trivially_destructible_v<T>) {
        if (auto alignment = (begin + offset) % alignof(Destructor)) {
            offset += alignof(Destructor) - alignment;
        }
        destructorOffset = offset;
        offset += sizeof(Destructor);
    }

    if (auto alignment = (begin + offset) % alignof(T)) {
        offset += alignof(T) - alignment;
    }

    if (offset + sizeof(T) > mBufferSize) {
        throw ArenaError("Not enough space left in the arena.");
    }

    std::uint8_t *objectPtr = mBufferStart + offset;
    T *ptr = new (objectPtr) T(std::forward<Args>(args)...);
    if constexpr (!std::is_trivially_destructible_v<T>) {
        PushDestructor(mBufferStart + destructorOffset, ptr);
    }
    mBufferCurrent = mBufferStart + offset + sizeof(T);
    return ptr;
}

The implementation is slightly more complicated, but we're now avoiding the destructor overhead for types that don't need special cleanup.

Proper Initialization with Error Recovery

Some people and projects like to pretend that exceptions don't exist. However, in reality exceptions do exist, and they can sometimes be thrown from constructors. The arena can, and should, provide strong exception guarantees, so the user can assume that it has recovered to the previous state as it was before the failed call. However, that's not all. The arena doesn't need to call constructors at all for implicit-lifetime types. Using these constraints, the initialization can be divided into three distinct kinds.

No work is necessary for implicit-lifetime types, returning the pointer is enough.
Types whose constructors are noexcept can be initialized simply via placement new.
The most complicated are throwing constructors. These need a proper harness for cleaning up in case the constructor throws.

namespace Implementation
{
    template <typename T>
    struct IsImplicitLifetime
    {
    private:
        template <typename U>
        using NoExtents = std::remove_all_extents_t<U>;

        using Type = std::remove_cv_t<T>;

        static constexpr bool Scalar = std::is_scalar_v<Type>;
        static constexpr bool Array = std::is_array_v<Type> &&
                                      IsImplicitLifetime<NoExtents<Type>>::Value;
        static constexpr bool ClassType =
            std::is_trivially_destructible_v<Type> &&
            (std::is_aggregate_v<Type> || std::is_trivially_constructible_v<Type>);
    public:
        static constexpr bool Value = Scalar || Array || ClassType;
    };
}

template <typename T>
concept IsImplicitLifetime = requires {
    requires Implementation::IsImplicitLifetime<T>::Value;
};

template <typename T, typename... Args>
Handle<T> Arena::Make(Args &&...args)
{
    if (sizeof(T) > mBufferSize) {
        throw ArenaError("The object is larger than the arena.");
    }

    std::intptr_t begin = reinterpret_cast<std::intptr_t>(mBufferStart);
    std::ptrdiff_t offset = mBufferCurrent - mBufferStart;

    std::ptrdiff_t destructorOffset;
    if constexpr (!std::is_trivially_destructible_v<T>) {
        if (auto alignment = (begin + offset) % alignof(Destructor)) {
            offset += alignof(Destructor) - alignment;
        }
        destructorOffset = offset;
        offset += sizeof(Destructor);
    }

    if (auto alignment = (begin + offset) % alignof(T)) {
        offset += alignof(T) - alignment;
    }

    if (offset + sizeof(T) > mBufferSize) {
        throw ArenaError("Not enough space left in the arena.");
    }

    std::uint8_t *currentBackup = mBufferCurrent;
    mBufferCurrent = mBufferStart + offset + sizeof(T);
    std::uint8_t *objectPtr = mBufferStart + offset;

    T *ptr = nullptr;
    if constexpr (IsImplicitLifetime<T> && sizeof...(Args) == 0) {
        ptr = std::launder(reinterpret_cast<T *>(objectPtr));
    }
    else if constexpr (std::is_nothrow_constructible_v<T, Args...>) {
        ptr = new (objectPtr) T (std::forward<Args>(args)...);
    }
    else {
        try {
            ptr = new (objectPtr) T (std::forward<Args>(args)...);
        }
        catch (...) {
            mBufferCurrent = currentBackup;
            throw;
        }
    }

    if constexpr (!std::is_trivially_destructible_v<T>) {
        PushDestructor(mBufferStart + destructorOffset, ptr);
    }

    return ptr;
}

With these changes, the Make function is now ready for most types. Note that the aliasing problem is resolved by bumping the current pointer before calling the object constructor, and it's restored to its previous state in case an exception is thrown from the object constructor. It doesn't consume the exception, proper error handling is still required in user code. This is a more consistent option, since the user code is making use of exceptions in the first place.

Handling Arrays, and Why It's Not so Easy

The arena can handle most kinds of objects correctly. However, it can't handle arrays of objects, apart from wrapper types like std::array. This situation can't be handled particularly well in the above implementation, because unlike other types, the language doesn't provide proper constructors and destructors. Another detail worth noting is that the Handle<T> is very awkward when used for arrays⁹. To handle this case, an alternative MakeArray function is necessary. This is similar to the reasons why C++ itself distinguishes between new and new[]. The following piece of code assumes the existence of an ArrayHandle<T> type, which serves as an equivalent to Handle<T> with a modified interface to better accomodate arrays¹⁰.

When allocating and initializing arrays, the basic rules are generally similar to the rules when dealing with single objects. Another thing worth noting is that thankfully, it's not necessary to align each object separately, this is handled by the sizeof operator including the necessary padding to preserve the alignment between elements. Finally, the placement new operator is useable for this bit as well, which would handle everything correctly, even in case one of the constructors fails, but the code below writes all of this out explicitly to show all the hidden magic.

The tricky bit is dealing with destruction. The language doesn't provide any special tools in this scenario¹¹. To deal with destruction, there are three obvious options:

Adding a special ArrayDestructor, and store it as a separate list directly in the arena. This option costs one more pointer directly in the arena, and a second list traversal during destruction.
Adding an ArrayDestructor, similarly as in the previous point, but calling it indirectly through a Destructor. This option costs nothing when array destructors are not used, but nontrivial array destruction then costs one more indirection, and two destructors' worth of memory overhead.
Modifying the Destructor to keep track of size as well. This option is unified, but it also increases the memory overhead of every single nontrivial destructor by a std::size_t, plus potentially more cycles spent on a for loop to destroy even single elements.

The choice is not obvious, every option has tradeoffs. However, the primary use case for this style of arena should be trivially destructible types. Destruction is always going to be expensive due to pointer chasing during the list traversal and destructor calls. The arena should also use the same trick as above to omit destructor calls for arrays with trivially destructible element types. With these assumptions, it should be very rare that a nontrivially destructible array ever appears in the arena. If it does, its destruction is likely going to be expensive anyway, and the extra memory and indirection overhead should hopefully be negligible compared to the array itself. Note that this is a case of wishful thinking, the actual performance characteristics are untested.

struct Arena::ArrayDestructor
{
    using ArrayDestructorPtrType = void(*)(void *array, std::size_t size);

    void                  *ArrayPtr;
    std::size_t            Size;
    ArrayDestructorPtrType DestructorPtr;

    ~ArrayDestructor()
    {
        DestructorPtr(ArrayPtr, Size);
    }
};

template <typename T>
Arena::ArrayDestructor *
Arena::MakeArrayDestructor(void *memory, T *array, std::size_t size)
{
    return new (memory) ArrayDestructor {
        .ArrayPtr      = array,
        .Size          = size,
        .DestructorPtr = [] (void *raw, std::size_t size) {
            T *array = std::launder(reinterpret_cast<T *>(raw));
            for (std::size_t i = size; i > 0; --i) {
                (array + (i - 1))->~T();
            }
        }
    };
}

template <typename T>
ArrayHandle<T> Arena::MakeArray(std::size_t size)
{
    if (!size) {
        return ArrayHandle<T>(nullptr, 0);
    }

    std::size_t totalSize = size * sizeof(T);

    if (totalSize > mBufferSize) {
        throw ArenaError("The object is larger than the arena.");
    }

    std::intptr_t begin = reinterpret_cast<std::intptr_t>(mBufferStart);
    std::ptrdiff_t offset = mBufferCurrent - mBufferStart;

    std::ptrdiff_t destructorOffset;
    std::ptrdiff_t arrayDestructorOffset;
    if constexpr (!std::is_trivially_destructible_v<T>) {
        if (auto alignment = (begin + offset) % alignof(Destructor)) {
            offset += alignof(Destructor) - alignment;
        }
        destructorOffset = offset;
        offset += sizeof(Destructor);

        if (auto alignment = (begin + offset) % alignof(ArrayDestructor)) {
            offset += alignof(ArrayDestructor) - alignment;
        }
        arrayDestructorOffset = offset;
        offset += sizeof(ArrayDestructor);
    }

    if (auto alignment = (begin + offset) % alignof(T)) {
        offset += alignof(T) - alignment;
    }

    if (offset + totalSize > mBufferSize) {
        throw ArenaError("Not enough space left in the arena.");
    }

    std::uint8_t *currentBackup = mBufferCurrent;
    mBufferCurrent = mBufferStart + offset + totalSize;
    std::uint8_t *arrayPtr = mBufferStart + offset;

    if constexpr (IsImplicitLifetime<T>) {
        // No need to do anything
    }
    else if constexpr (std::is_nothrow_default_constructible_v<T>) {
        for (std::size_t i = 0; i < size; ++i) {
            std::uint8_t *currentObject = arrayPtr + i * sizeof(T);
            new (currentObject) T ();
        }
    }
    else {
        std::size_t currentSize = 0;
        try {
            for (std::size_t i = 0; i < size; ++i, ++currentSize) {
                std::uint8_t *currentObject = arrayPtr + i * sizeof(T);
                new (currentObject) T ();
            }
        }
        catch (...) {
            if constexpr (!std::is_trivially_destructible_v<T>) {
                for (std::size_t i = currentSize; i > 0; --i) {
                    std::uint8_t *currentObject = arrayPtr + (i - 1) * sizeof(T);
                    std::launder(reinterpret_cast<T *>(currentObject))->~T();
                }
            }
            mBufferCurrent = currentBackup;
            throw;
        }
    }

    T *ptr = std::launder(reinterpret_cast<T *>(arrayPtr));

    if constexpr (!std::is_trivially_destructible_v<T>) {
        std::uint8_t *rawMemory = mBufferStart + arrayDestructorOffset;
        ArrayDestructor *destructor = MakeArrayDestructor(rawMemory, ptr, size);
        PushDestructor(mBufferStart + destructorOffset, destructor);
    }

    return ArrayHandle<T>(ptr, size);
}

Admittedly, this code isn't the cleanest out there, and the function is fairly lengthy, but it should hopefully be quite obvious what it's trying to achieve after reading the explanation above. The code also leaves out some extra details, such as amending the Arena definition with the extra declarations. Another detail to note is that it's likely unnecessary to align the memory for the array destructor, since the alignment is unlikely to be stricter, or even differ.

Extra Implementation Notes

The most barebones arena implementation itself is now complete. It can handle nearly all types, including arrays, with the MakeArray function. For production use, it should also restrict the accepted types right at the API boundary for cleaner error messages. The arena could also use some extra constructor overloads, such as constructing from a std::span<std::uint8_t>. A basic draft of a more robust API is follows.

template <typename T>
concept ArenaConstructible = requires {
    requires !std::is_array_v<T>;
    requires std::is_nothrow_destructible_v<T>;
};

template <typename T>
concept ArenaArrayConstructible = requires {
    requires !std::is_array_v<T>;
    requires std::is_default_constructible_v<T>;
    requires std::is_nothrow_destructible_v<T>;
};

class Arena
{
public:
    Arena(std::uint8_t *buffer, std::size_t size);
    Arena(std::span<std::uint8_t> buffer);
    ~Arena();

    Arena(const Arena &) = delete;
    Arena &operator=(const Arena &) = delete;

    Arena(Arena &&other) noexcept;
    Arena &operator=(Arena &&other) noexcept;

    template <ArenaConstructible T, typename... Args>
    Handle<T> Make(Args &&...args);

    template <ArenaArrayConstructible T>
    ArrayHandle<T> MakeArray(std::size_t size);

    void Reset() noexcept;

private:
    struct Destructor;
    struct ArrayDestructor;

    template <typename T>
    void PushDestructor(void *memory, T *object);

    template <typename T>
    ArrayDestructor *MakeArrayDestructor(void *memory, T *array, std::size_t size);

    std::uint8_t *mBufferStart;
    std::uint8_t *mBufferCurrent;
    std::size_t   mBufferSize;
    Destructor   *mDestructorList;
};

It's also possible to change the MakeArray function to accept a value for initializing each element, and potentially more convoluted ideas, for example initializing from a std::span<T>. Unfortunately, that is out of scope for this post.

The arena can be changed in many different ways. Currently, it works on a fixed buffer, but this can be improved by making an overarching type like a ChunkedArena that can dynamically allocate multiple fixed arenas, and potentially allocate large objects separately directly on the free store. This can be done quite easily by initializing a std::unique_ptr in the arena. With some extra effort, the interfaces of these different implementations can be unified with a base class that supports tag based dispatch, so that user code can be completely agnostic to what exact arena implementation it's using. Virtual dispatch is not applicable in this case.

Using the Arena: Designing Data Structures

When building custom data structures that can be tailored for use in the arena, it's optimal to make all types trivially destructible. These types should also ideally rely on fixed buffers, or have generally fixed size. For example, resizing a dynamic array can very quickly completely fill the arena, because its allocation strategy is not smart, and it cannot reuse any memory without completely resetting all objects. As a consequence, linked list based structures are the least wasteful containers with a dynamic size. Finally, when writing more complicated structures, it's usually a good idea to make them move only. Naively copying mutable handles into the arena can lead to inconsistencies in state. For example, a simple arena-aware implementation of a linked list follows.

template <typename T>
class ArenaList
{
public:
    class Node
    {
    public:
        template <typename... Args>
        Node(Args &&...args)
            : mNext(nullptr)
            , mItem(std::forward<Args>(args)...)
        {}

        Handle<Node> Next() const
        {
            return mNext;
        }

        Handle<T> Get()
        {
            return &mItem;
        }

    private:
        friend class ArenaList<T>;

        Handle<Node> mNext;
        T            mItem;

    };

    ArenaList()
        : mHead(nullptr)
        , mTail(nullptr)
    {}

    ArenaList(const ArenaList &) = delete;
    ArenaList &operator=(const ArenaList &) = delete;

    ArenaList(ArenaList &&other)
        : mHead(std::exchange(other.mHead, nullptr))
        , mTail(std::exchange(other.mTail, nullptr))
    {}

    ArenaList &operator=(ArenaList &&other)
    {
        mHead = std::exchange(other.mHead, nullptr);
        mTail = std::exchange(other.mTail, nullptr);
        return *this;
    }

    Handle<Node> Head() const
    {
        return mHead;
    }

    template <typename... Args>
    Handle<T> Prepend(Arena &arena, Args &&...args)
    {
        Handle<Node> node = arena.template Make<Node>(std::forward<Args>(args)...);
        node->mNext = mHead;
        mHead = node;
        if (!mTail) {
            mTail = node;
        }
        return node->Get();
    }

    template <typename... Args>
    Handle<T> Append(Arena &arena, Args &&...args)
    {
        Handle<Node> node = arena.template Make<Node>(std::forward<Args>(args)...);
        if (mTail) {
            mTail->mNext = node;
        }
        mTail = node;
        if (!mHead) {
            mHead = node;
        }
        return node->Get();
    }

private:
    Handle<Node> mHead;
    Handle<Node> mTail;
};

This is a very simple working implementation of a linked list. If the list were copyable, the copies would initially look at the same list of items. However, if one of the copies was modified, either via Prepend or Append, one of the copies would immediately be left in an invalid state, where either the head or tail is off, and further modifications can break both copies.

Since it might not be immediately obvious, this list implementation supports iteration via the Head and Next functions. Since this is a singly-linked list, only forward iteration is easy to implement, but that is sufficient for many uses. A small example of usage is included below.

std::uint8_t buffer[4096];
Arena arena(buffer, 4096);

ArenaList<int> list;

for (int i = 0; i < 8; ++i) {
    list.Prepend(arena, i);
}
for (int i = 0; i < 8; ++i) {
    list.Append(arena, i + 8);
}

auto node = list.Head();
while (node) {
    std::print("{} ", *node->Get());
    node = node.Next();
}
std::println();

The example above should output something like the following line.

7 6 5 4 3 2 1 0 8 9 10 11 12 13 14 15

Conclusion

In this post, I tried my best to showcase a practically useful C++ implementation of an arena. I wrote this to have a versatile memory management primitive that's easy to use, and predictable in its characteristics for my markup library. The only "out of pocket" choice I made are the indirect array destructors. The idea behind it is that I wanted it to handle these kinds of arrays if necessary, but I would avoid them as much as I possibly can in my own code. I was successful in this effort, because I didn't need any explicit array destructors at all. That is why I handwaved the reasoning behind choosing that option. I didn't want to pay the cost of a rarely useful feature on all codepaths.

To summarize, this Arena implementation just receives a buffer, and it does all the associated bookkeeping to properly allocate objects, and destroy them all at once when the arena is reset, or goes out of scope. The only catch with this approach is that handles should never outlive the arena itself.

Possible Future Work

Of course, this implementation is not perfect. It's definitely possible to decrease the destructor cost by a full pointer by making an array that grows backwards from the end of the buffer towards the beginning, a bit like how the call stack works. An interesting consequence of this would be that the allocation would become a bit more expensive, but the destruction could potentially be a bit faster. However, it would also require rethinking a lot of what happens in the Make and MakeArray functions.

1: At the time of writing. It was roughly in the middle of 2025.

2: Every single document is arguably a tree, but, at least in my opinion, it's more correct to think of them as forests. The reason why there's a need for multiple top-level nodes is due to the doctype, (XML) declaration, processing instructions, and comments. Every single one of these entities may (and some even must) appear at the top-level of the document. However, despite this, I will refer to this structure as the document tree, as that's the more established terminology.

3: They are also often called regions, among other things. For a more complete list of names for this concept, take a look at the wikipedia entry.

4: This is not quite true. Since C++17, implicit-lifetime types can start their lifetime from arrays of std::byte or std::uint8_t (or more precisely unsigned char). That means something like the following incantation works just fine:

static_assert(std::is_implicit_lifetime_v<T>);
alignas(alignof(T)) std::uint8_t buffer[sizeof(T)];
T *validPtr = std::launder(reinterpret_cast<T *>(buffer));

Note that the lifetime of the object actually starts only after it's accessed. Plus, just so life isn't too simple, the std::is_implicit_lifetime trait was added only in C++23, and support is not mainstream yet (January 2026). Before it's properly accessible, a polyfill created from the core language named requirements is necessary to satisfy the constraints correctly.

5: This is also not quite true. Calling the destructor isn't necessary if the type is trivially destructible (verifiable via std::is_trivially_destructible_v<T>).

6: Such as something like this:

class HandleError: public std::runtime_error
{
public:
    using std::runtime_error::runtime_error;
};

template <typename T>
class Handle
{
public:
    Handle(T *handle)
        : mHandle(handle)
    {}

    template <typename U>
        requires std::is_base_of_v<T, U>
    Handle(Handle<U> other)
        : mHandle(other.mHandle)
    {}

    explicit operator bool() const
    {
        return mHandle != nullptr;
    }

    T *operator->() const
    {
        if (mHandle == nullptr) {
            throw HandleError("Tried to access a null handle");
        }
        return mHandle;
    }

    T &operator*() const
    {
        if (mHandle == nullptr) {
            throw HandleError("Tried to dereference a null handle");
        }
        return *mHandle;
    }

    T *Get()
    {
        return mHandle;
    }

private:
    template <typename U>
    friend class Handle;
    T *mHandle;
};

7: This is a simplification that works on x64 and aarch64, but it's not a universal truth. For example, there could be a 64bit architecture that needs 16B per function pointer.

On a somewhat related note, we could technically save 8B by not storing the object pointer explicitly and relying on an assumption that the destructor instance is immediately followed by its object in memory, but this is a very quick way to UB land. We cannot arbitrarily assume this, because the object can have alignment requirements that push it further away than the address immediately after the destructor object.

8: Imagine a scenario where we're constructing an object that needs to have a string, or it needs to construct a fixed size buffer. For example, a node of a XML document tree where the size is known upfront. In that case, the object can receive a reference to the arena itself in its constructor, and it can call the Make function recursively.

Keep in mind that in this scenario, the construction is fairly easy, but the destructor must never touch handles to its subobjects in the arena during destruction, because their destructors have already been called due to the reverse destruction ordering.

9: For an example, try allocating an object like std::array<std::string, 4> using the Make function. After the object is created, an incantation like (*array)[0] is required to access use the operator[], which is unnecessarily cluttered.

10: The basic idea behind the array handle is something like the following piece of code. This is only a rough draft, there could be many other things included, such as optional bounds checks based on some configuration option.

template <typename T>
class ArrayHandle
{
public:
    ArrayHandle(T *handle, std::size_t size)
        : mHandle(handle)
        , mSize(size)
    {}

    explicit operator bool() const
    {
        return mHandle != nullptr;
    }

    T &operator[] (std::size_t i)
    {
        if (mHandle == nullptr) {
            throw HandleError("Tried to dereference a null handle");
        }
        return mHandle[i];
    }

    const T &operator[] (std::size_t i) const
    {
        if (mHandle == nullptr) {
            throw HandleError("Tried to dereference a null handle");
        }
        return mHandle[i];
    }

    T *Get()
    {
        return mHandle;
    }

    std::size_t Size() const
    {
        return mSize;
    }

private:
    T          *mHandle;
    std::size_t mSize;
};

11: This is technically not true, the standard library contains utilities like std::destroy, or std::destroy_n. The problem with these utilities is that they do not follow the proper reverse destruction order. It's more convenient to just write out the loop manually, rather than bring in the entire <algorithm> header for a single utility like std::views::reverse.