[In this in-depth technical article, Gruen & Story examine anti-aliasing in games, explaining how you can reduce 'jaggies' in your PC title, and save frame-rate, by using significantly less post-processing passes.]
Over the last few years it has become commonplace for PC games to make use of Multi-Sample Anti-Aliasing (MSAA) to achieve higher quality rendering.
MSAA is a very effective and efficient method for reducing the unsightly "jaggies" that result from the triangle rasterization process. At the same time most game engines also employ post-processing techniques such as depth-of-field, motion blur, color correction and refraction.
Post-processing has become increasingly popular, as it provides a way to carry out complex computations, but only pay the cost for visible pixels. It is not unheard of for an engine to contain up to 20 passes, and these techniques usually require a copy of the main render target as a texture input.
If the engine is making use of MSAA, then the render target will need to be resolved before it can be used in the next pass. This is accomplished through calls to IDirect3DDevice9::StretchRect or ID3D10Device::ResolveSubresource, depending on which version of D3D is being used.
As modern game engines tend to apply multiple post-processing techniques, it is easy to understand how the application could trigger a loop of resolves (Figure 1).
It is critically important to understand that a resolve is not a free operation, and that performing multiple resolves per frame can have a very serious impact on performance. This statement is true for all graphics hardware.
To take a real world example, the developers of a recently released PC title managed to reduce their resolve count from a staggering 22 to just 12. This generated a saving of around 12 ms per frame, at a resolution of 1280x1024@4xAA.
The goal of this article is to describe how to minimize the resolve count in the rendering pipeline without compromising the quality of post-processing effects or deferred shading techniques.
The resolves that should be removed fall into two categories, redundant resolves and harmful resolves, and these will be described in detail later in this article. But first let's consider the resolves that are necessary for good image quality.
We know that the use of MSAA render targets is only helpful when draw calls produce visible "jaggies". In an ideal world the main geometry pass would be rendered in MSAA mode, and then resolved to a non-MSAA render target. Any subsequent post-processing passes would all be completed in non-MSAA mode. This would therefore give rise to just a single resolve per frame.
However there are two reasons why a post-processing technique may need to be performed in MSAA mode:
- If a post-processing technique enables subsample based depth testing, it can result in an update to some of the subsamples of a pixel.
- In a similar way, if alpha blending is enabled, then subsample data is preserved through the blend operation.
In these two cases it may indeed make sense to resolve the render target for further passes. However these two examples are the exception and it should be noted that for full screen passes that do not enable depth testing or alpha blending, there is precious little point in using MSAA mode.
A technique that does not actually draw any geometry, other than a full screen quad, will usually write the same color to all subsamples in a MSAA render target (Figure 2).
The reason for this is that the pixel shader is only run once per pixel and the whole pixel is covered. Effectively the MSAA buffer has been turned into a non-MSAA buffer, and every further resolve operation on this surface is redundant.
Aside from the obvious redundancy, once the same color has been written to all subsamples of the corresponding pixels, it should be noted that the MSAA depth buffer does not actually match the silhouettes of the objects anymore.
Clearly the solution is to render these passes in non-MSAA mode, thus completely avoiding the need to perform resolves. The recommended way to avoid these unnecessary resolves is as follows:
- Create the main frame buffer (swap chain) in non-MSAA mode.
- Create an intermediate MSAA render target where the main scene geometry is rendered, and anything else that would result in "jaggies".
- Perform a resolve of the intermediate MSAA render target to a non-MSAA surface.
- Ping pong between non-MSAA render targets for the remaining passes (Figure 3).
To add a real world example to this discussion, the following sequence of passes was uncovered during the analysis of a recently released PC title:
- Render the geometry pass into the main MSAA render target M
- Resolve M into a non-MSAA render target A
- Render A on to M using a full-screen quad
- Resolve M into A
- Render water to M
- Resolve M into A for further post-processing
It is fairly obvious from an initial glance at this sequence, that steps two through four are totally redundant. In fact step three is actually harmful from a quality standpoint, as it destroys the subsample color information.
Clearly it is possible to jump directly from step one to five, having removed no less than two resolve operations, whilst maintaining the subsample color information.
So why would the developer fail to spot this? The answer lies in the fact that modern engines are highly object oriented, and that several developers are making changes to the rendering code over time.
Apparently step three was originally a valid post-processing effect, and when it was changed, it effectively became just a copy operation which made steps two and three redundant. The resolve in step four was triggered because M was accidently added as an input to step five.
As you can see it is very easy to introduce redundant resolves into the rendering pipeline. It always pays to be on top of the various passes carried out during a frame, and is generally good practice to regularly inspect PIX dumps for unexpected behavior.
It is common for deferred rendering techniques to store information such as depth, position, normal, velocity and material ID to an intermediate render target. If this is carried out in MSAA mode, then the data would need to be resolved before being put to use later in the frame.
The problem here is that the fixed function resolve operation will simply perform an average of the subsamples. This is very unlikely to yield the developer's intended result, and will most probably result in graphical artifacts.
Let us consider the case where material IDs are to be resolved. I think we would all have to agree that averaging material IDs is never going to make any sense, and that performing such an operation would, in a worst case scenario, produce invalid IDs.
So how should we deal with this kind of data, when a standard fixed function resolve, is clearly not the way to go?
In DX10 it is possible to write a pixel shader that can read the subsamples of an input texture. In the case of a deferred lighting technique, it would then be possible to perform the lighting calculation on each subsample, and then average the results. In this way the shader has effectively performed a custom resolve.
DX10.1 capable hardware removes a further limitation by allowing access to the subsamples of the depth buffer, which can eliminate the need for a separate depth pass.
Another prominent example of a technique that suffers from using fixed function resolved data is non-linear tone mapping. The only correct way to perform tone mapping in a multi-sampling context is to tone map every subsample using a shader-based custom resolve.
In DX9 it is not possible to do this, so it may be that the resulting artifacts have to be tolerated, although it should be said that the implementation of explicit super sampling could achieve similar results. For performance reasons it may be necessary to carry out post-processing with data produced by a harmful resolve, though this should be kept to a minimum.
It is very important to appreciate that a resolve is not a free operation, in fact it is a decidedly expensive procedure, and should therefore be kept to a minimum. Keep in mind that most resolves are either redundant or harmful.
To avoid redundancy, remember to resolve the main MSAA render target as early as possible, and then work in non-MSAA mode for post-processing effects. Write shader based custom resolves, to properly deal with high quality post-processing and deferred rendering techniques.
Remember that it's easy to overlook
what is really happening amongst the various rendering passes, so regular
analysis is essential to resolving your resolves!