Featured Blog

Renderer Back-End Architecture

In this blog post I explain how to create a back-end renderer that you can access on multiple threads and also hide some of the inherent latency of API calls on multi-core systems.

This blog post was originally released at

Rendering is a very core piece of any game engine whether it’s 2D or 3D. Regardless of the complexity of your high-level algorithms, however, it’s a very simple process at the very core. Luckily it’s even more simple thanks to all the great APIs that we have at our disposal. In the time of Doom and Quake you basically had to do everything yourself starting from writing the code to map polygon surfaces to pixels on the screen. Once we started getting decent graphics accelerators there also had to be APIs to control them. A couple of those were developed over the time but the ones that stuck on multiple platforms were OpenGL and Direct3D.

Rendering today consists of figuring out which objects to render, in which order and with which pieces of program code to process the individual graphics primitives like vertices, triangles and pixels. The more complicated part is implementing the different effects and visibility algorithms that drive the back-end rendering process, which, in turn, is pretty much just command processing.

Command Processing

The APIs of today actually function as relatively simple command processors. The different functions that are part of the API actually create commands that are then dispatched to the graphics hardware for processing (a bit more complicated of course, but you get the idea). There are a couple of things to remember about API calls, however.

The first thing to remember is that there is always an overhead to each call. Depending on the API this might amount to quite a lot of time. OpenGL tends to have lower costs than Direct3D but every revision since 10 has actually brought significant improvements to this due to fewer run-time checks that have to be performed. This is greatly attributed to the use of immutable states that are only checked on creation time.

Another thing is that most APIs work well only when used from a single thread. The OpenGL API is notorious about being very prohibitive about multi-threaded use. This means that all the graphics assets have to be initialized on the same thread as the calls to render them.

Simple Solution

A solution I have used in many cases, which is also used by many other engines, is to isolate the back-end renderer on its own thread. This thread only processes commands provided by the different threads using the renderer.

The benefits of this are that you can use the renderer back-end on multiple threads while still accessing the actual API from a single thread. This has architectural and performance benefits when implementing asset loading, for instance, in that you can also initialize the assets on the loading thread without having to synchronize with the main thread.

The queue that is processed by the back-end can also be persisted. This way you can pre-process your meshes and UI rendering into a stored command buffer that you only need to send to the back-end when you are ready to render. If you have your shader constants in shared memory that is only referenced by a command you can also change properties like transformations without having to regenerate the command sequence.

This also simplifies multi-threaded rendering a lot. You can use pre-compilation of the command buffer to render a bunch of meshes and then append those sequences to the command buffer so different sequences will not be messed up by commands from other rendering operations in a middle.

It’s also simple to implement multiple back-ends for different APIs, like OpenGL and Direct3D. You just have to implement a unique command processor that will interpret the common command definition. If you are doing unit testing (which I hope you are) then you can also reuse the same tests for the different implementations because the interface does not change.


Rendering is a simple process but one where you can optimize a lot. There’s an overhead to every single call and hiding this will give you more time to do the good stuff, if you are running on a multi-core system. A command queue implementation allows you to do this while providing a generic interface to support platform specific APIs while also allowing multi-threaded access to it.

Latest Jobs

Xbox Game Studios

Redmond, Washington
Technical Lighting Artist


Hamburg, Germany
Game Designer - Elvenar

Bandai Namco Mobile

Barcelona, Spain
Principal 3D Animator

Cryptic Studios

Los Gatos, California
Staff Core Software Engineer
More Jobs   


Explore the
Subscribe to
Follow us

Game Developer Job Board

Game Developer Newsletter


Explore the

Game Developer Job Board

Browse open positions across the game industry or recruit new talent for your studio

Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more