The idea of using the stencil buffer to generate shadows has been around for over a decade, but only recently has 3D graphics hardware advanced to the point where using the stencil algorithm on a large scale has become practical. Not long ago, there existed some unsolved problems pertaining to stencil shadows that prevented the algorithm from working correctly under various conditions. Advances have now been made, however, so that stencil shadows can be robustly implemented to handle arbitrarily positioned point lights and infinite directional lights having any desired spatial relationship with the camera. This article presents the intricacies of the entire stencil shadow algorithm and covers every mathematical detail of its efficient implementation.
The basic concept of the stencil shadow algorithm is to use the stencil buffer as a masking mechanism to prevent pixels in shadow from being drawn during the rendering pass for a particular light source. This is accomplished by rendering an invisible shadow volume for each shadow-casting object in a scene using stencil operations that leave nonzero values in the stencil buffer wherever light is blocked. Once the stencil buffer has been filled with the appropriate mask, a lighting pass only illuminates pixels where the value in the stencil buffer is zero.
As shown in Figure 1, an object’s shadow volume encloses the region of space for which light is blocked by the object. This volume is constructed by finding the edges in the object’s triangle mesh representing the boundary between lit triangles and unlit triangles and extruding those edges away from the light source. Such a collection of edges is called the object’s silhouette with respect to the light source. The shadow volume is rendered into the stencil buffer using operations that modify the stencil value at each pixel depending on whether the depth test passes or fails. Of course, this requires that the depth buffer has already been initialized to the correct values by a previous rendering pass. Thus, the scene is first rendered using a shader that applies surface attributes that do not depend on any light source, such as ambient illumination, emission, and environment mapping.
Figure 1. An object’s shadow volume encloses the region of space for which light is blocked by the object.
The original stencil algorithm renders the shadow volume in two stages. In the first stage, the front faces of the shadow volume (with respect to the camera) are rendered using a stencil operation that increments the value in the stencil buffer whenever the depth test passes. In the second stage, the back faces of the shadow volume are rendered using a stencil operation that decrements the value in the stencil buffer whenever the depth test passes. As illustrated in Figure 2, this technique leaves nonzero values in the stencil buffer wherever the shadow volume intersects any surface in the scene, including the surface of the object casting the shadow.
Figure 2. Numbers at the ends of rays emanating from the camera position C represent the values left in the stencil buffer for a variety of cases. The stencil value is incremented when front faces of the shadow volume pass the depth test, and the stencil value is decremented when back faces of the shadow volume pass the depth test. The stencil value does not change when the depth test fails.
There are two major problems with the method just described. The first is that no matter what finite distance we extrude an object’s silhouette away from a light source, it is still possible that it is not far enough to cast a shadow on every object in the scene that should intersect the shadow volume. The example shown in Figure 3 demonstrates how this problem arises when a light source is very close to a shadow-casting object. Fortunately, this problem can be elegantly solved by using a special projection matrix and extruding shadow volumes all the way to infinity.
Figure 3. No matter what finite distance an object’s silhouette is extruded away from a light source, moving the light close enough to the object can result in a shadow volume that cannot reach other objects in the scene.
The second problem shows up when the camera lies inside the shadow volume or the shadow volume is clipped by the near plane. Either of these occurrences can leave incorrect values in the stencil buffer causing the wrong surfaces to be illuminated. The solution to this problem is to add caps to the shadow volume geometry, making it a closed surface, and using different stencil operations. The two caps added to the shadow volume are derived from the object’s triangle mesh as follows. A front cap is constructed using the unmodified vertices of triangles facing toward the light source. A back cap is constructed by projecting the vertices of triangles facing away from the light source to infinity. For the resulting closed shadow volume, we render back faces (with respect to the camera) using a stencil operation that increments the stencil value whenever the depth test fails, and we render front faces using a stencil operation that decrements the stencil value whenever the depth test fails. As shown in Figure 4, this technique leaves nonzero values in the stencil buffer for any surface intersecting the shadow volume for arbitrary camera positions. Rendering shadow volumes in this manner is more expensive than using the original technique, but we can determine when it’s safe to use the less-costly depth-pass method without having to worry about capping our shadow volumes.
Figure 4. Using a capped shadow volume and depth-fail stencil operations allows the camera to be inside the shadow volume. The stencil value is incremented when back faces of the shadow volume fail the depth test, and the stencil value is decremented when front faces of the shadow volume fail the depth test. The stencil value does not change when the depth test passes.
The details of everything just described are discussed throughout the remainder of this article. In summary, the rendering algorithm for a single frame runs through the following steps.
A Clear the frame buffer and perform an ambient rendering pass. Render the visible scene using any surface shading attribute that does not depend on any particular light source.
B Choose a light source and determine what objects may cast shadows into the visible region of the world. If this is not the first light to be rendered, clear the stencil buffer.
C For each object, calculate the silhouette representing the boundary between triangles facing toward the light source and triangles facing away from the light source. Construct a shadow volume by extruding the silhouette away from the light source.
D Render the shadow volume using specific stencil operations that leave nonzero values in the stencil buffer where surfaces are in shadow.
E Perform a lighting pass using the stencil test to mask areas that are not illuminated by the light source.
F Repeat steps B through E for every light source that may illuminate the visible region of the world.
For a scene illuminated by n lights, this algorithm requires at least n+1 rendering passes. More than n+1 passes may be necessary if surface shading calculations for a single light source cannot be accomplished in a single pass. To efficiently render a large scene containing many lights, one must be careful during each pass to render only objects that could potentially be illuminated by a particular light source. An additional optimization using the scissor rectangle can also save a significant amount of rasterization work -- this optimization is discussed in the last section of this article.
Infinite View Frustums
To ensure that shadow volumes surround every last bit of space for which light is blocked by an object, we must extrude the object’s silhouette to infinity. Using a standard perspective projection matrix would cause such a shadow volume to be clipped by the far plane. To avoid this unwanted effect, we can actually place the far plane at an infinite distance from the camera.
Recall that the projection matrix transforms points from eye space to clip space. In OpenGL, eye space is the coordinate system in which the camera lies at the origin, the x-axis points to the right, the y-axis points upward, and the camera points down the negative z-axis. In clip space, a 4D homogeneous point <x,y,z,w> is inside the view frustum if -w<x<w, -w<y<w, and -w<z<w. Once primitives have been clipped, a vertex is transformed into a 3D point in normalized device coordinates by performing a perspective divide by its w-coordinate. This results in a point whose x, y, and z coordinates all lie in the range [-1,1]. In the final transformation before rasterization, these coordinates are remapped to the dimensions of the viewport and the physical range of the depth buffer.
The standard OpenGL perspective projection matrix P has the form
where n is the distance to the near plane, f is the distance to the far plane, and l, r, b, and t represent the left, right, bottom, and top edges of the rectangle carved out of the near plane by the view frustum. By evaluating the limit as f tends to infinity, we obtain the matrix
The matrix P¥ transforms a 4D homogeneous eye-space point Veye=<x,y,z,w> to the clip-space point Vclip as follows.
Assuming w>0 (it is normally the case that w=1), the resulting z-coordinate of is always less than the resulting w-coordinate of Vclip, ensuring that projected points are never clipped by the far plane. A point at infinity is represented by 4D homogeneous vector having a w-coordinate of zero in eye space. For such a point, (Vclip)z = (Vclip)w, and the perspective divide produces a 3D point in normalized device coordinates having the maximal z-value of one.
In practice, the limitations of hardware precision can produce points having a normalized z-coordinate slightly greater than one. This causes severe problems when the z-coordinate is converted to an integer value to be used in the depth buffer because the stencil operations that depend on the depth test to render shadow volumes may no longer function correctly. To circumvent this undesirable effect, we can map the z-coordinate of a point at infinity to a value slightly less than one in normalized device coordinates. The z-coordinate of a 3D point D in normalized device coordinates is mapped from a value Dz in the range [-1,1] to a value D'z in the range [-1,1-e], where e is a small positive constant, using the relation
We need to find a way to modify the z-coordinate of Vclip in order to perform this mapping as points are transformed from eye space into clip space. We can rewrite Equation (4) as an adjustment to (Vclip)z by replacing Dz with (Vclip)z / (Vclip)w and D'z with (V'clip)z / (Vclip)w as follows.
Plugging in the values of and given by equation (3), we have
Solving for and simplifying yields
We can incorporate this mapping into the projection matrix P¥ given by Equation (2) as follows to arrive at the slightly tweaked matrix P'¥ that we actually use to render a scene.
If the graphics hardware supports depth clamping, then use of the matrix P'¥ given by Equation (8) is not necessary. The GL_NV_depth_clamp extension to OpenGL allows a renderer to force depth values in normalized device coordinates to saturate to the range [-1,1], thus curing the precision problem at the infinite far plane. When depth clamping is enabled using the function call
the projection matrix P¥ given by Equation (2) can safely be used.
The question of depth buffer precision arises when using an infinite projection matrix. It is true that placing the far plane at infinity reduces the number of discrete depth values that can occur within any finite interval along the z-axis, but in most situations this effect is small. Consider the function that uses the matrix P given in Equation (1) to map an eye-space point V=<Vx,Vy,Vz,1> to its corresponding depth in normalized device coordinates:
We obtain a different dfunction d¥(V) by using the matrix P¥ given by Equation (2) to map an eye-space point V to its normalized depth:
Given two eye-space points V1and V2, we can compare the differences in depth values produced by the functions d and d¥ as follows.
This demonstrates that the standard projection matrix P maps the points V1and V2 to a range that is a factor f /(f-n) larger than the range to which the points are mapped by the infinite projection matrix , thus equating to greater precision. For practical values of f and n, where f is much larger than one and n is much smaller than one, is close to unity, so the loss of precision is not a significant disadvantage.
The stencil shadow algorithm requires that the models in our world be closed triangle meshes. In mathematical terms, the surface of any object that casts a shadow must be a two-dimensional closed manifold. What this boils down to is that every edge in a mesh must be shared by exactly two triangles, disallowing any holes that would let us see the interior of the mesh.
Edge connectivity information must be precomputed so that we can determine a mesh’s silhouette for shadow volume rendering. Suppose that we have an indexed triangle mesh consisting of an array of N vertices V1,V2,…,VN and an array of M triangles T1,T2,…,TM. Each triangle simply indicates which three vertices it uses by storing three integer indexes i1, i2, and i3. We say that an index ip precedes an index iq if the number p immediately precedes the number q in the cyclic chain 1®2®3®1. For instance, i2 precedes i3 and i3 precedes i1, but i2 does not precede i1.
The indexes i1, i2, and i3 are ordered such that the positions of the vertices Vi1, Vi2, and Vi3 to which they refer are wound counterclockwise about the triangle’s normal vector. Suppose that two triangles share an edge whose endpoints are the vertices Va and Vb as shown in Figure 5. The consistent winding rule enforces the property that for one of the triangles, the index referring to Va precedes the index referring to Vb, and that for the other triangle, the index referring to Vb precedes the index referring to Va.
As demonstrated in Listing 1, the edges of a triangle mesh can be identified by making a single pass through the triangle list. For any triangle having vertex indexes i1, i2, and i3, we create an edge record for every instance in which i1< i2, i2< i3, or i3< i1and store the index of the current triangle in the edge record. This procedure creates exactly one edge for every pair of triangles that share two vertices Va and Vb, duplicating any edges that are shared by multiple pairs of triangles.
Once we have identified all the edges, we make a second pass through the triangle list to find the second triangle that shares each edge. This is done by locating triangles for which i1> i2, i2> i3, or i3> i1 and matching it to an edge having the same vertex indexes that has not yet been supplied with a second triangle index.
Armed with the edge list for a triangle mesh, we determine the silhou¹ette by first calculating the dot product between the light position and the plane of each triangle. For a triangle whose vertex indexes are i1, i2, and i3, the (unnormalized) outward-pointing normal direction N is given by
since the vertices are assumed to be wound counterclockwise. The 4D plane vector K corresponding to the triangle is then given by
Let L represent the 4D homogeneous position of the light source. For point light sources, LW≠0, and for infinite directional light sources, LW=0. A triangle faces the light source if K·L>0; otherwise, the triangle faces away from the light source. The silhouette is equal to the set of edges shared by one triangle facing the light and one triangle facing away from the light.
Shadow Volume Construction
Once the set of an object’s silhouette edges has been determined with respect to a light source, we must extrude each edge away from the light’s position to form the object’s shadow volume. In this section, we present methods that perform the extrusion by making use of widely available vertex programming hardware exposed by the GL_NV_vertex_program and GL_EXT_vertex_shader extensions to OpenGL.
For a point light source, the extrusion of the silhouette edges consists of a set of quads, each of which has the two unmodified vertices belonging to an edge and two additional vertices corresponding to the extrusion of the same edge to infinity. For an infinite directional light source, all points project to the same point at infinity, so the extrusion of the silhouette edges can be represented by a set of triangles that all share a common vertex. We distinguish between points that should be treated normally and those that should be extruded to infinity by using 4D homogeneous coordinates. A w-coordinate of one is assigned to the unmodified vertices and a w-coordinate of zero is assigned to the extruded vertices. The extrusion methods that we present utilize the information stored in the w-coordinate to perform the appropriate vertex modifications.
Before we examine the extrusion methods, we must prepare the appropriate quad list or triangle list (depending on whether we are using a point light or infinite directional light). We need to make sure that the vertices of each extrusion primitive are wound so that the face’s normal direction points out of the shadow volume. Suppose that a silhouette edge E has endpoints A and B. The edge-finding code presented in Listing 1 associates the triangle for which the vertices A and B occur in counterclockwise order as the first triangle sharing the edge E. Thus, if the first triangle faces toward the light source, then we want the vertices A and B to occur in the opposite order for the extruded primitive so that its vertices are wound counterclockwise. If the first triangle faces away from the light source, then we use the vertices A and B in the same order for the extruded primitive. Table 1 lists the vertices of the extrusion of the edge E for point light sources and infinite directional light sources for the cases that the first triangle associated with the edge E faces toward or away from the light source.
Table 1. Given a silhouette edge E having endpoints A and B, this table lists the object-space vertices of the extruded shadow volume face corresponding to E. The first triangle associated with the edge E is the triangle for which the vertices A and B occur in counterclockwise order.
Using the GL_NV_vertex_program extension, we can employ a couple simple vertex programs to perform edge extrusion and transformation to clip space. In each program, we assume that the product of the projection matrix and model-view matrix has been tracked into constant registers c–c and that the object-space light position has been stored in constant register c. Vertex programs are enabled and these constants are loaded using the following function calls, where lx, ly, lz, and lw represent the light position.
glProgramParameter4fNV(GL_VERTEX_PROGRAM_NV, 4, lx, ly, lz, lw);
For a point light source residing at the point L in object space, a vertex V from Table 1 is unmodified if its w-coordinate is one and is extruded if its w-coordinate is zero by using the formula
The following vertex program applies this formula and then transforms the resulting vertex position into clip space.
ADD R0.xyz, v[OPOS], -c;
MAD R0, v[OPOS].w, c, R0;
DP4 o[HPOS].x, c, R0;
DP4 o[HPOS].y, c, R0;
DP4 o[HPOS].z, c, R0;
DP4 o[HPOS].w, c, R0;
In the case that shadow volume caps must be rendered (see the next section), a vertex program nearly identical to the one above should be used to transform vertices belonging to triangles that face away from the light source. Such vertices can be treated as if their w-coordinates are zero, so the MAD instruction has no effect and can be removed when projecting a back cap.
For an infinite light source residing at the point