[In this Intel-sponsored feature, part of Gamasutra's Visual Computing section, Kalra examines ways to render realistic grass in your game, utilizing DirectX 10 and vertex shaders.]
Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article introduces the concept of geometry instancing with Direct3D 10 APIs and shows how it can be used to implement realistic grass on consumer graphics hardware.
A typical patch of grass can easily have a few hundred thousand blades. Each blade is similar to the other, with slight variations in color, position, and orientation. Rendering a large number of small objects, each made from a few polygons, is not optimal.
Current generation graphics APIs, such as DirectX and OpenGL, are not designed to efficiently render models with a small number of polygons thousands of times per frame.
To efficiently render thousands of blades of grass, the number of draw calls needs to be drastically reduced. If the geometry of the grass blades doesn't change, the best approach is to process the grass elements in a vertex buffer and render them in one draw call.
However, if the geometry does change often-for example, if the level-of-details scheme is being used for geometry simplification-this approach won't work, because a large amount of data would need to be sent to the graphics card every time the geometry changes.
Geometry instancing allows the reuse of geometry when drawing multiple similar objects in a scene. The common data is stored in a vertex buffer, and the differentiating parameters, such as position and color, are stored in a separate vertex (instance) buffer. The hardware uses the vertex and instance buffers to render unique instances of the models. (Refer 5)
Using geometry instancing APIs helps factor common data from the unique data (flyweight design pattern) and thus reduces memory utilization and bandwidth. The vertex buffer can stay resident in graphics memory, and the instance buffer can be updated more frequently if needed, providing performance and flexibility.
In the example described in this article, numerous small patches of grass are drawn across the terrain. A patch of grass consists of a vertex buffer that contains a number of randomly placed intersecting quads. Each quad is mapped with a texture containing a few blades of grass.
A natural waving motion of the grass blades is achieved by animating the vertices of each quad using a combination of sine waves of different frequencies. Color changes that occur with the waving motion and from the effects of the wind are simulated using the same sine wave that animates the grass.1
Geometry instancing places numerous small patches along a grid on the terrain. This method allows visible patches to be selectively drawn. Patches with various levels of detail (depending on the camera position) can also be introduced with relative ease.
Figure 1 highlights the dynamic culling of grass geometry. Only patches shown in blue are rendered. Refer to the code sample provided later in this article for more details.
Figure 1. Selective drawing of grass patches using geometry instancing.
1 Isidoro, J. and D. Card, "Animated Grass with Pixel and Vertex Shaders." http://ati.amd.com/developer/shaderx/shaderx_animatedgrass.pdf
Instancing with Direct3D 10
Several steps are necessary to implement geometry instancing using the Direct3D 10 API.
1. Define the vertex and instance buffers.
Direct3D 10 does not distinguish between the various buffer types; they are all stored as D3D10Buffers. To render instanced grass, two buffers are created: one contains the static geometry information for the patch, and the other contains the various positions at which the patches are to be drawn.
2. Associate the input buffers with a vertex shader.
The vertex and instance buffers are associated with a vertex shader in Direct3D 10 using an input layout object, which describes how the vertex buffer data is streamed into the input assembler (IA) pipeline stage (Figure 2).
Figure 2. The process of streaming the vertex buffer into the input assembler stage.
An input-layout object is created from an array of input-element descriptions and a pointer to a compiled shader. Each element describes the data structure of the vertex buffer/buffers and its layout.
The input-element array described below is used for the sample source provided for rendering instanced grass. The first two elements of the array define the data structure of the vertex data coming from the vertex buffer, while the third element describes the data structure of the instance data coming from the instance buffer (second vertex buffer).
Notice that the input slot (fourth data entry) for the elements is different for the vertex and instance buffers. For more details, refer to "Getting Started with the Input-Assembler Stage."2
3. Bind the Objects
Once the vertex buffers are ready, they are bound to the IA stage as the source listing below shows. An array of vertex buffer pointers containing vertex and index buffers, strides, and offsets is created and bound to the IA along with the previously created layout.
4. Draw the primitives
Once all the input resources have been bound to the pipeline, draw calls are issued to render the primitives. Direct3D 10 supports various instanced draw calls for drawing geometry, based on the primitive topology used. The example below shows the draw calls for triangle lists used to render the instanced grass sample.
The implementation of the source's instancing portion is divided into two classes: (1) InstancedBillboard and (2) BBGrassPatch. The InstancedBillboard class is a generic class used to hide the implementation details about instancing. It accepts the vertex and instance data structures as the templated inputs. The BBGrassPatch class handles the implementation details and initializes the grass blades and patches.
2 "Getting Started with the Input-Assembler Stage (Direct3D 10)." http://msdn.microsoft.com/en-us/library/bb205117(VS.85).aspx
Rendering grass blades with alpha-to-coverage
The grass patch is rendered as a number of randomly placed intersecting quads. Each quad is mapped with an alpha texture containing a few blades of grass. Rendering it as is with alpha blending requires that the quads to be sorted back to front in order to render transparency correctly.
Using this approach can be computationally expensive especially if hundreds of thousands of quads must be sorted every time the camera moves. Sending the data to the graphics card or using depth peeling on the graphics card is equally prohibitive.
Using alpha-to-coverage solves this problem. Since the grass billboards use the alpha channel as cut-outs (alpha is either 0 or 1), this method works well. However, the cut-outs are not always binary. In some cases the edges of the cut-outs are blurred to make the vegetation look more realistic. In this case alpha-to-coverage and multi-sample anti-aliasing (MSAA) can solve the problem.
Alpha-to-coverage converts the alpha-component output from the pixel shader to a coverage mask that is applied at the sub-pixel resolution of an MSAA render target. When the MSAA resolve is applied, the output pixel gets a transparency from 0 to 1 depending on alpha coverage and the MSAA sample count. The images produced using this technique (refer fig3) look realistic and have no artifacts. 3,4
This method is a pseudo order-independent transparency solution and works well if the alpha channel is being used for cut-outs (alpha is either 0 or 1) like rendering vegetation. However, this method doesn't work if correct transparency is desired.
Future Work and Conclusions
The Direct3D 10 API simplifies the implementation of instancing and also offers improved performance. Because the mapping of vertex buffers and shader inputs is done using the creation of input layouts during intialization, it doesn't need to be done at draw time for every frame, as in earlier versions of the API, thereby improving the performance.
The sample source provided can be modified with relative ease to create a level of detail simplification scheme. Multiple static patches at different levels of detail can be generated, each with a fewer number of polygons.
The patches can be placed on the grid with instancing depending on the camera distance. Further simplification can be achieved by using two animated textures for grass patches at even greater distances.
The sample currently runs at reasonably good frame rates on consumer graphics hardware (30-60 fps) and can be further optimized by tweaking the size of the static grass patch (hence the number of blades) and number of instances, along with the other level of detail optimizations mentioned above.
Figure 3 shows an image of grass field rendered on integrated graphics hardware with alpha-to-coverage running at about 60 fps.
Figure 3. Field of grass rendering using alpha- to-coverage.
The following resource provides additional information:
Carucci, Francesco, "Inside Geometry Instancing," in GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Matt Pharr, ed., Addison-Wesley Professional, 2005.
3 Instancing10 Sample-Microsoft DirectX SDK: http://msdn.microsoft.com/en-us/library/bb205317(VS.85).aspx
4 Aliasing with transparency - Nvidia Technical report: http://developer.download.nvidia.com/SDK/9.5/Samples/DEMOS/Direct3D9/src/AntiAliasingWithTransparency/docs/AntiAliasingWithTransparency.pdf