This month, we take a look at optimizing content based upon parametric surfaces. To do this we concentrate on a specific case, 4x4 Bezier patches. By creating an optimized engine that incorporates tessellated surfaces, we can exploit features such as continuous LOD and nonrigid body deformation. This article features the work of Ronen Zohar, a member of Intel Israel's Media Team, who has experience with a wide range of 3D engine optimizations under his belt, including Intel's Processor Specific Graphics Pipeline for Direct3D since release 6.
With increased processor power, previous offline 3D tasks, such as dealing with highorder surfaces, can now be performed at runtime. A big advantage of highorder surfaces is that their relatively small data set makes them perfect for bandwidthlimited Internet applications. Since current 3D HW can render triangles only, we must generate triangles from the highorder surface using a process called “tessellation.” In this article, we show how to perform uniform tessellation of 4x4 Bezier surfaces (or patches), using the SIMD power of the Pentium III processor’s Streaming SIMD Extensions. Our implementation is written in C++ using the Streaming SIMD Extensions support classes and the Intel C/C++ compiler.
Introduction to Bezier Curves and Surfaces
Bezier
curves are parametric curves defined by a set of n+1 control points
(P_{0}P_{n}), which_{ }define the “basic shape”
of the curve, and a parameter (t), in the range of [0..1]. The mathematical
representation of a Bezier curve is:
where
the polynomials, B_{i,n}, (called Bezier basis functions) are
defined as:
Bezier curves are polynomials of nth degree that interpolate between the control points, where C(0) = P_{0} and C(1) = P_{n}.
Bezier surfaces are twoparameter expansions to the Bezier curve concept. Bezier surfaces are defined by an (n+1) x (m+1) matrix of control points (P), which define the “basic shape” of the surface, and two parameters (s and t), both in the range [0..1]. The mathematical representation of a Bezier surface is:
Lighting and reflection mapping calculations require the normal vector for each surface point (s, t). To calculate this vector, we first differentiate the surface equation with respect to s and t. The resultant partial derivatives define two vectors in the plane tangent to the surface at (s, t). The normal vector for the surface at this point is the cross product of these two tangent vectors:
In this paper, we deal with 4x4 Bezier curves, which are bicubic polynomials. The basis functions and their derivatives for the n=3 case are:
The Tessellation Process
There are three basic steps in the tessellation process: taking samples of the parametric surface, connecting these samples into triangles, and generating the tessellated surface vertices.
Many techniques and algorithms deal with the problem of sampling the parametric surface. Some are based on the surface curvature (the smaller the surface curvature, the fewer triangles you need to approximate the surface shape). We use the simplest algorithm, which uniformly divides the [0…1]^{2} (s, t) plane into some tessellation level , and then samples the surface at these points. Note that we designed our data structures so that any other tessellationsampling algorithm can be used instead of our simple algorithm.
After determining the sample points, we precalculate the basis function values at these sample points and generate an indices list that accompanies the tessellation data. This indices list describes the triangle connectivity of the surface sample points that will be generated by the tessellation process. The indices list will be submitted to the rasterizer, along with the surface sample points, in order to render the tessellated surface.
We can use two methods to generate the tessellated surface vertices. The simplest method is to compute the surface position and normal in object space (the space where the control points are defined), and then feed these vertices to the triangle transform and lighting routines. A more complex, but faster, method is to transform the control points to screen space and tessellate using the transformed control points. In our case, we will need 16 transformations, regardless of the number of vertices generated by the tessellation process. The two drawbacks of the second method are:
 For small vertex counts, it is cheaper to transform only an already tessellated surface.
 Lighting calculations sometimes require object or world space position. In this case, we will have to tessellate once for screen space and once for object/world space.
We’ve implemented only the second method, but the implementation of the first method can be done easily by commenting the transformation and lighting calculations from the routines.
Data Structures and Classes
The CBezierTessellation Class:
The CBezierTessellation class serves two purposes. The first is to hold the triangle indices data for the current tessellation level. This data is sent to the rasterization HW and indicates how to connect the vertices generated by the tessellation process into triangles. The second purpose is to hold precalculated values for the basis functions and their derivatives for the chosen (s, t) parameter pair at each sample point. Since our tessellation is not dependent on the position of the control points of the Bezier patch, we can use one CBezierTessellation object for all the Bezier patches in our application.
The CBezierSurface and CBezierPatch Classes:A single, 4x4 patch can generate only a limited set of objects. This is why we often see sets of 4x4 patches connected to each other to produce the object, where each patch defines only a small portion of the object’s surface. Neighboring patches are connected by sharing the control points along the common edge of the connected surface. To use the data efficiently, we define two classes: CBezierSurface, which holds all the control points used by the object, and CBezierPatch, which holds a 4x4 matrix of indices to control points stored in the patch parent CBezierSurface object.
Streaming SIMD Extensions Implementation
SetupWe use Streaming SIMD Extensions to evaluate the surface position and surface normal for four sample points in parallel. Within the data structure, the basis function values for the sample points (s, t) are organized into groups of four values each. To improve cache locality, we store these values in continuos memory blocks. The memory footprint of the basis function values looks like:
B_{0,3}(s0),B_{0,3}(s1),B_{0,3}(s2),B_{0,3}(s3),B_{1,3}(s0),B_{1,3}(s1),,,,B_{3,3}(t0), B_{3,3}(t1), B_{3,3}(t2), B_{3,3}(t3)
For every iteration of four sample points, we need 32 floatingpoint values (4 s values x 4 basis functions + 4 t values x 4 basis functions), or 4 cache lines. We use the prefetch instruction to tell the processor that the next 4 cache lines will be used in the next iteration. This way, no cache misses occur during the execution of the tessellation algorithm.
We also setup the tessellation process by expanding the control points four times. We need to expand the current set of control points because the algorithm uses the control points values in parallel to generate four vertices. (When tessellating directly to screen space, we need to expand the control points after the transformation to screen).
Position calculation
There
is a big difference between the tessellation of screen space surfaces
and the tessellation of object space surfaces. For object space surfaces,
we only need to calculate the surface position “by the book”. For screen
space surfaces, we usually need to transform using a perspective projection.
The perspective projection changes the actual position of the x, y coordinates
based on their zvalue. To make the surface persistent in projective
transformations, we must convert our patch to a “Rational Bezier surface”.
A Rational Bezier surface is based on a set of homogenous control points
coming from the controlpoint transformation algorithm. The fourth coordinate
(W) is used as a weight that divides all the other control points coordinates:
The last stage of the surface position generation is the reformatting of the output data to the rasterizer. The data comes out as groups of 4 X’s, 4 Y’s, 4 Z’s and 4 W’s, while the rasterizer expects x, y, z, w vectors. We use the _MM_TRANSPOSE4_PS macro to transpose the values to the correct format.
Finally,
the code for the tessellation of surface position is:
// Zero the vertex coordinates
vertex.x = vertex.y = vertex.z = vertex.w = _ZERO_;
for (k=0;k<4;k++) {
for (t=0;t<4;t++) {
// premultiply basis function values
for s & t
F32vec4 coeff
= coeffs[k] * coeffs[t + 8];
// get the control point based on its index
DWORD idx = _indices[t][k];
PIIIVector4
*pt = &projBase[idx];
// sum the vertex components
vertex.x += coeff*pt>x;
vertex.y += coeff*pt>y;
vertex.z += coeff*pt>z;
// Weight calculation required only
when tessellating directly
//
to screen space
vertex.w += coeff*pt>w;
}
}
// Perspective division  required only when tessellating
//directly to screen space
F32vec4 rhw = rcp(vertex.w);
// 1 over w using rcp
vertex.x *= rhw;
vertex.y *= rhw;
vertex.z *= rhw;
//
Transpose position values from [xxxx], [yyyy], [zzzz], [wwww]
// to [x,y,z,w]X4 format.
_MM_TRANSPOSE4_PS(vertex.x, vertex.y, vertex.z,rhw);
// Store the four vertices position, using unaligned move
storeu(&vtx[0].sx, vertex.x);
storeu(&vtx[1].sx, vertex.y);
storeu(&vtx[2].sx, vertex.z);
storeu(&vtx[3].sx,rhw);
Surface Normal Calculation
To
calculate the surface normals, we must first calculate the two tangent
vectors to the surface. We calculate the s tangent vector by replacing
the sbased basis function value in the position calculation with the
value of its derivative at the sample point s. Similarly, we calculate
the t tangent vector by replacing the tbased basis function value with
the value of its derivative at the sample point t. The cross product
of these vectors generates the surface normal at the sampling point.
To do this calculation efficiently, we precalculate all the basis function
derivatives data needed during the process, and store the derivative
values together with the basis function values that are already stored
for the position calculation. The code for this calculation is:
normk.x = normk.y = normk.z
= normt.x = normt.y = normt.z = _ZERO_;
for (k=0;k<4;k++) {
for (t=0;t<4;t++) {
PIIIVector3 *wpt =
&worldBase[idx];
coeff = coeffs[k+4] * coeffs[t +
8];
normt.x += coeff *
wpt>x;
normt.y += coeff * wpt>y;
normt.z += coeff * wpt>z;
coeff = coeffs[k] * coeffs[t + 12];
normk.x += coeff * wpt>x;
normk.y
+= coeff * wpt>y;
normk.z
+= coeff * wpt>z;
}
}
PIIIVector3
norm;
// Cross product of partial derivatives
norm.x = normk.y*normt.z  normk.z*normt.y;
norm.y = normk.z*normt.x  normk.x*normt.z;
norm.z = normk.x*normt.y  normk.y*normt.x;
// One over size of normal
F32vec4 size = rsqrt(norm.x*norm.x + norm.y*norm.y + orm.z*norm.z);
// Normalize the surface normal
norm.x *= size;
norm.y *= size;
norm.z *= size;
Performance
These performance numbers are measured on the “Utah teapot” Bezier surface, which is made from 306 control points and 32, 4x4 Bezier patches. The numbers are measured in Kcycles for the tessellation of the complete teapot on Pentium III processor @ 500 Mhz.
Tessellate
position only


Tessellation level 
#
of Triangles

Legacy

Pentium
III code 
Ratio

2 
64 
1280 
1150 
111% 
3 
256 
2400 
1550 
155% 
4 
576 
3880 
1770 
219% 
6 
1600 
6500 
2800 
232% 
8 
3136 
8570 
4050 
212% 
12 
7744 
11500 
6800 
169% 
18 
18496 
13500 
9650 
140% 
24 
33856 
14500 
11700 
124% 
Tessellate
position and calculate
one directional light in world space 

Tessellation level 
#of
triangles

Legacy
code 
Pentium
III code 
Ratio 
2 
64 
2800 
1610 
174% 
3 
256 
5100 
2770 
184% 
4 
576 
7100 
3300 
215% 
6 
1600 
10100 
5380 
188% 
8 
3136 
12000 
7370 
163% 
12 
7744 
13800 
10450 
132% 
18 
18496 
14850 
12850 
116% 
24 
33856 
15530 
13880 
112% 
Ideas for Future Improvements
The normal calculation and, in point light cases, the extra tessellation to object/world coordinates, is very compute intensive. There are several ways to approximate the surface normals using different levels of interpolations of the surface edges. In our code, we implemented a simple linear interpolation. Other interpolation techniques use Hermite basis functions, which are based on the secondderivative values on the surface edges. All these techniques produce a fairly decent image, as long as the surface curvature is relatively small.
Bezier patches can save communications bandwidth, as well as add to the photorealism of the application. The optimization techniques shown enables realtime Bezier patch tessellation within a 3D engine; thus encouraging this type of content to be more common in real time applications.
A demo utilizing the techniques discussed in this article can be found here.
Haim Barad has a Ph.D. in Electrical Engineering (1987) from the University of Southern California. His areas of concentration are in 3D graphics, video and image processing. Haim was on the Electrical Engineering faculty at Tulane University before joining Intel in 1995. Haim is a staff engineer and currently leads the Media Team at Intel's Israel Design Center (IDC) in Haifa, Israel.