44 min read

Animation With Cg

Cg is a programming language for creating special effects and real-time graphics on Nvidia hardware without having to program directly to the graphics hardware assembly language. This article introduces Cg and explains how to employ animation techniques using Cg. Excerpted Addison Wesley's new book, "The Cg Tutorial."

What is Cg? The Cg language makes it possible for you to control the shape, appearance, and motion of objects drawn using programmable graphics hardware. It marries programmatic control of these attributes with the incredible speed and capabilities of today's graphics processors. Never before have computer graphics practitioners, whether artists or programmers, had so much control over the real-time images they generate.

Cg provides developers with a complete programming platform that is easy to use and enables the fast creation of special effects and real-time cinematic-quality experiences on multiple platforms. By providing a new level of abstraction, Cg removes the need for developers to program directly to the graphics hardware assembly language, and 2 thereby more easily target OpenGL, DirectX, Windows, Linux, Macintosh OS X, and console platforms such as the Xbox. Cg was developed in close collaboration with Microsoft Corporation and is compatible with both the OpenGL API and Microsoft's High-Level Shading Language (HLSL) for DirectX 9.0.

Cg stands for "C for graphics." The C programming language is a popular, general purpose language invented in the 1970s. Because of its popularity and clean design, C provided the basis for several subsequent programming languages. For example, C++ and Java base their syntax and structure largely on C. The Cg language bases itself on C as well. If you are familiar with C or one of the many languages derived from C, then Cg will be easy to learn.

On the other hand, if you are not familiar with C or even programming languages in general but you enjoy computer graphics and want to learn something new, read on anyway. Cg programs tend to be short and understandable.

A Language for Programming Graphics Hardware

Cg is different from C, C++, and Java because it is very specialized. No one will ever write a spreadsheet or word processor in Cg. Instead, Cg targets the ability to programmatically control the shape, appearance, and motion of objects rendered using graphics hardware. Broadly, this type of language is called a shading language. However, Cg can do more than just shading. For example, Cg programs can perform physical simulation, compositing, and other nonshading tasks.

Think of a Cg program as a detailed recipe for how to render an object by using programmable graphics hardware. For example, you can write a Cg program to make a surface appear bumpy or to animate a virtual character. Later you will learn more about the history of shading languages and where Cg fits into this history.

Cg's Data-Flow Model

In addition to being specialized for graphics, Cg and other shading languages are different from conventional programming languages because they are based on a data- 3 flow computational model. In such a model, computation occurs in response to data that flows through a sequence of processing steps.

Cg programs operate on vertices and fragments (think "pixels" for now if you do not know what a fragment is) that are processed when rendering an image. Think of a Cg program as a black box into which vertices or fragments flow on one side, are somehow transformed, and then flow out on the other side. However, the box is not really a black box because you get to determine, by means of the Cg programs you write, exactly what happens inside.

Every time a vertex is processed or the rasterizer generates a fragment while rendering a 3D scene, your corresponding vertex or fragment Cg program executes.

Most recent personal computers-and all recent game consoles-contain a graphics processing unit (GPU) that is dedicated to graphics tasks such as transforming and rasterizing 3D models. Your Cg programs actually execute within the GPU of your computer.

GPU Specialization and CPU Generalization

Whether or not a personal computer or game console has a GPU, there must be a CPU that runs the operating system and application programs. CPUs are, by design, general purpose. CPUs execute applications (for example, word processors and accounting packages) written in general-purpose languages, such as C++ or Java.

Because of the GPU's specialized design, it is much faster at graphics tasks, such as rendering 3D scenes, than a general-purpose CPU would be. New GPUs process tens of millions of vertices per second and rasterize hundreds of millions or even billions of fragments per second. Future GPUs will be even speedier. This is overwhelmingly faster than the rate at which a CPU could process a similar number of vertices and fragments. However, the GPU cannot execute the same arbitrary, general-purpose programs that a CPU can.

The specialized, high-performance nature of the GPU is why Cg exists. Generalpurpose programming languages are too open-ended for the specialized task of processing vertices and fragments. In contrast, the Cg language is fully dedicated to this task. Cg also provides an abstract execution model that matches the GPU's execution model. You will learn about the unique execution model of GPUs in coming sections.

The Performance Rationale for Cg

To sustain the illusion of interactivity, a 3D application needs to maintain an animation rate of 15 or more images per second. Generally, we consider 60 or more frames per second to be "real time," the rate at which interaction with applications appears to occur instantaneously. The computer's display may have a million or more pixels that require redrawing. For 3D scenes, the GPU typically processes every pixel on the screen many times to account for how objects occlude each other, or to improve the appearance of each pixel. This means that real-time 3D applications can require hundreds of millions of pixel updates per second. Along with the required pixel processing, 3D models are composed of vertices that must be transformed properly before they are assembled into polygons, lines, and points that will be rasterized into pixels. This can require transforming tens of millions of vertices per second.

Moreover, this graphical processing happens in addition to the considerable amount of effort required of the CPU to update the animation for each new image. The reality is that we need both the CPU and the GPU's specialized graphics-oriented capabilities. Both are required to render scenes at the interactive rates and quality standards that users of 3D applications and games demand. This means a developer can write a 3D application or game in C++ and then use Cg to make the most of the GPU's additional graphics horsepower.

Coexistence with Conventional Languages

In no way does Cg replace any existing general-purpose languages. Cg is an auxiliary language, designed specifically for GPUs. Programs written for the CPU in conventional languages such as C or C++ can use the Cg runtime to load Cg programs for GPUs to execute. The Cg runtime is a standard set of subroutines used to load, compile, manipulate, and configure Cg programs for execution by the GPU. Applications supply Cg programs to instruct GPUs on how to accomplish the programmable rendering effects that would not otherwise be possible on a CPU at the rendering rates a GPU is capable of achieving.

Cg enables a specialized style of parallel processing. While your CPU executes a conventional application, that application also orchestrates the parallel processing of vertices and fragments on the GPU, by programs written in Cg.

If a real-time shading language is such a good idea, why didn't someone invent Cg sooner? The answer has to do with the evolution of computer graphics hardware. Prior 5 to 2001, most computer graphics hardware-certainly the kind of inexpensive graphics hardware in PCs and game consoles-was hard-wired to the specific tasks of vertex and fragment processing. By "hard-wired," we mean that the algorithms were fixed within the hardware, as opposed to being programmable in a way that is accessible to graphics applications. Even though these hard-wired graphics algorithms could be configured by graphics applications in a variety of ways, the applications could not reprogram the hardware to do tasks unanticipated by the designers of the hardware. Fortunately, this situation has changed.

Graphics hardware design has advanced, and vertex and fragment processing units in recent GPUs are truly programmable. Before the advent of programmable graphics hardware, there was no point in providing a programming language for it. Now that such hardware is available, there is a clear need to make it easier to program this hardware. Cg makes it much easier to program GPUs in the same manner that C made it much easier to program CPUs.

Before Cg existed, addressing the programmable capabilities of the GPU was possible only through low-level assembly language. The cryptic instruction syntax and manual hardware register manipulation required by assembly languages-such as DirectX 8 vertex and pixel shaders and some OpenGL extensions-made it a painful task for most developers. As GPU technology made longer and more complex assembly language programs possible, the need for a high-level language became clear. The extensive low-level programming that had been required to achieve optimal performance could now be delegated to a compiler, which optimizes the code output and handles tedious instruction scheduling. Figure 1-1 is a small portion of a complex assembly language fragment program used to represent skin. Clearly, it is hard to comprehend, particularly with the specific references to hardware registers.

In contrast, well-commented Cg code is more portable, more legible, easier to debug, and easier to reuse. Cg gives you the advantages of a high-level language such as C while delivering the performance of low-level assembly code.

Other Aspects of Cg

Cg is a language for programming "in the small." That makes it much simpler than a modern general-purpose language such as C++. Because Cg specializes in transforming vertices and fragments, it does not currently include many of the complex features required for massive software engineering tasks. Unlike C++ and Java, Cg does not support classes and other features used in object-oriented programming. Current Cg 6 implementations do not provide pointers or even memory allocation (though future implementations may, and keywords are appropriately reserved). Cg has absolutely no support for file input/output operations. By and large, these restrictions are not permanent limitations in the language, but rather are indicative of the capabilities of today's highest performance GPUs. As technology advances to permit more general programmability on the GPU, you can expect Cg to grow appropriately. Because Cg is closely based on C, future updates to Cg are likely to adopt language features from C and C++.

. . .
DEFINE LUMINANCE = {0.299, 0.587, 0.114, 0.0};
TEX H0, f[TEX0], TEX4, 2D;
TEX H1, f[TEX2], TEX5, CUBE;
MULX H1.w, H1.x, H1.x;
MOVH H2, f[TEX3].wxyz;
MULX H1.w, H1.x, H1.w;
DP3X, H2.xzyw, H0;
MULX, H0, H1.w;
TEX H1, f[TEX0], TEX1, 2D;
TEX H3, f[TEX0], TEX3, 2D;
MULX, H0, H3;
MADX H1.w, H1.w, 0.5, 0.5;
MULX, H1, {0.15, 0.15, 1.0, 0.0};
MOVX H0.w, H1.w;
TEX H3, f[TEX3], TEX2, 1D;
MULX H3.w, H0.w, H2.w;
MULX, H3, H3.w;
. . .

Figure 1-1. A Snippet of Assembly Language Code

Cg provides arrays and structures. It has all the flow-control constructs of a modern language: loops, conditionals, and function calls.

Cg natively supports vectors and matrices because these data types and related math operations are fundamental to graphics and most graphics hardware directly supports vector data types. Cg has a library of functions, called the Standard Library, that is well suited for the kind of operations required for graphics. For example, the Cg Standard Library includes a reflect function for computing reflection vectors. Cg programs execute in relative isolation. This means that the processing of a particular vertex or fragment has no effect on other vertices or fragments processed at the same time. There are no side effects to the execution of a Cg program. This lack of interdependency among vertices and fragments makes Cg programs extremely well suited for hardware execution by highly pipelined and parallel hardware.

The Limited Execution Environment of Cg Programs

When you write a program in a language designed for modern CPUs using a modern operating system, you expect that a more-or-less arbitrary program, as long as it is correct, will compile and execute properly. This is because CPUs, by design, execute general-purpose programs for which the overall system has more than sufficient resources.

However, GPUs are specialized rather than general-purpose, and the feature set of GPUs is still evolving. Not everything you can write in Cg can be compiled to execute on a given GPU. Cg includes the concept of hardware "profiles," one of which you specify when you compile a Cg program. Each profile corresponds to a particular combination of GPU architecture and graphics API. Your program not only must be correct, but it also must limit itself to the restrictions imposed by the particular profile used to compile your Cg program. For example, a given fragment profile may limit you to no more than four texture accesses per fragment.

As GPUs evolve, additional profiles will be supported by Cg that correspond to more capable GPU architectures. In the future, profiles will be less important as GPUs become more full-featured. But for now Cg programmers will need to limit programs to ensure that they can compile and execute on existing GPUs. In general, future profiles will be supersets of current profiles, so that programs written for today's profiles will compile without change using future profiles.

This situation may sound limiting, but in practice the Cg programs shown in this book work on tens of millions of GPUs and produce compelling rendering effects. Another reason for limiting program size and scope is that the smaller and more efficient your Cg programs are, the faster they will run. Real-time graphics is often about balancing increased scene complexity, animation rates, and improved shading. So it's always good to maximize rendering efficiency through judicious Cg programming.

Keep in mind that the restrictions imposed by profiles are really limitations of current GPUs, not Cg. The Cg language is powerful enough to express shading techniques that are not yet possible with all GPUs. With time, GPU functionality will evolve far enough that Cg profiles will be able to run amazingly complex Cg programs. Cg is a language for both current and future GPUs.

Vertices, Fragments, and the Graphics Pipeline

To put Cg into its proper context, you need to understand how GPUs render images. This section explains how graphics hardware is evolving and then explores the modern graphics hardware-rendering pipeline.

The Evolution of Computer Graphics Hardware

Computer graphics hardware is advancing at incredible rates. Three forces are driving this rapid pace of innovation, as shown in Figure 1-2. First, the semiconductor industry has committed itself to doubling the number of transistors (the basic unit of computer hardware) that fit on a microchip every 18 months. This constant redoubling of computer power, historically known as Moore's Law, means cheaper and faster computer hardware, and is the norm for our age.


Figure 1-2. Forces Driving Graphics Hardware Innovation

The second force is the vast amount of computation required to simulate the world around us. Our eyes consume and our brains comprehend images of our 3D world at an astounding rate and with startling acuity. We are unlikely ever to reach a point where computer graphics becomes a substitute for reality. Reality is just too real. Undaunted, computer graphics practitioners continue to rise to the challenge. Fortunately, generating images is an embarrassingly parallel problem. What we mean by "embarrassingly parallel" is that graphics hardware designers can repeatedly split up the problem of creating realistic images into more chunks of work that are smaller and easier to tackle. Then hardware engineers can arrange, in parallel, the ever-greater number of transistors available to execute all these various chunks of work.

Our third force is the sustained desire we all have to be stimulated and entertained visually. This is the force that "connects" the source of our continued redoubling of computer hardware resources to the task of approximating visual reality ever more realistically than before.

As Figure 1-2 illustrates, these insights let us confidently predict that computer graphics hardware is going to get much faster. These innovations whet our collective appetite for more interactive and compelling 3D experiences. Satisfying this demand is what motivated the development of the Cg language.



Movement in Time

Animation is the result of an action that happens over time-for example, an object that pulsates, a light that fades, or a character that runs. Your application can create these types of animation using vertex programs written in Cg. The source of the animation is one or more program parameters that vary with the passing of time in your application.

To create animated rendering, your application must keep track of time at a level above Cg and even above OpenGL or Direct3D. Applications typically represent time with a global variable that is regularly incremented as your application's sense of time advances. Applications then update other variables as a function of time.

You could compute animation updates on the CPU and pass the animated data to the GPU. However, a more efficient approach is to perform as much of the animation computation as possible on the GPU with a vertex program, rather than require the CPU to do all the number-crunching. Offloading animation work from the CPU can help balance the CPU and GPU resources and free up the CPU for more involved computations, such as collision detection, artificial intelligence, and game play.

A Pulsating Object

In this first example, you will learn how to make an object deform periodically so that it appears to bulge. The goal is to take a time parameter as input and then modify the vertex positions of the object geometry based on the time. More specifically, you need to displace the surface position in the direction of the surface normal, as shown in Figure 6-1.

By varying the magnitude of the displacement over time, you create a bulging or pulsing effect. Figure 6-2 shows renderings of this effect as it is applied to a character. The pulsating animation takes place within a vertex program.

The Vertex Program

Example 6-1 shows the complete source code for the C6E1v_bulge vertex program, which is intended to be used with the C2E2f_passthrough fragment program from Chapter 2. Only the vertex position and normal are really needed for the bulging effect. However, lighting makes the effect look more interesting, so we have included material and light information as well. A helper function called computeLighting calculates just the diffuse and specular lighting (the specular material is assumed to be white for simplicity).


Figure 6-1. Making an Object Bulge



Figure 6-2. A Pulsating Alien


float3 computeLighting ( float3 lightPosition,
float3 lightColor,
float3 Kd,
float shininess,
float3 P,
float3 N,
float3 eyePosition)

// Compute the diffuse lighting
float3 L = normalize(lightPosition - P);
float diffuseLight = max(dot(N, L), 0);
float3 diffuseResult = Kd * lightColor * diffuseLight;

// Compute the specular lighting
float3 V = normalize(eyePosition - P);
float3 H = normalize(L + V);
float3 specularLight = lightColor * pow(max(dot(N, H), 0),
if (diffuseLight <= 0) specularLight = 0;
float3 specularResult = lightColor * specularLight;
return diffuseResult + specularResult;

void C6E1v_bulge(float4 position : POSITION,
float3 normal : NORMAL,

out float4 oPosition : POSITION,
out float4 color : COLOR,

uniform float4x4 modelViewProj,
uniform float time,
uniform float frequency,
uniform float scaleFactor,
uniform float3 Kd,
uniform float shininess,
uniform float3 eyePosition,
uniform float3 lightPosition,
uniform float3 lightColor)

float displacement = scaleFactor * 0.5 *
sin(position.y * frequency * time) + 1;
float4 displacementDirection = float4(normal.x, normal.y,
normal.z, 0);
float4 newPosition = position +
displacement * displacementDirection;
oPosition = mul(modelViewProj, newPosition); = computeLighting(lightPosition, lightColor,
Kd, shininess,, normal, eyePosition);
color.w = 1;

Example 6-1. The C6E1v_bulge Vertex Program

Displacement Calculation - Creating a Time-Based Function

The idea here is to calculate a quantity called displacement that moves the vertex position up or down in the direction of the surface normal. To animate the program's effect, displacement has to change over time. You can choose any function you like for this. For example, you could pick something like this:

float displacement = time;

Of course, this behavior doesn't make a lot of sense, because displacement would always increase, causing the object to get larger and larger endlessly over time. Instead, we want a pulsating effect in which the object oscillates between bulging larger and returning to its normal shape. The sine function provides such a smoothly oscillating behavior.

A useful property of the sine function is that its result is always between -1 and 1. In some cases, such as in this example, you don't want any negative numbers, so you can scale and bias the results into a more convenient range, such as from 0 to 1:

float displacement = 0.5 * (sin(time) + 1);

Did you know that the sin function is just as efficient as addition or multiplication in the CineFX architecture? In fact, the cos function, which calculates the cosine function,is equally fast. Take advantage of these features to add visual complexity to your programs without slowing down their execution.

Adding Controls to the Program

To allow finer control of your program, you can add a uniform parameter that controls the frequency of the sine wave. Folding this uniform parameter, frequency, into the displacement equation gives:

float displacement = 0.5 * (sin(frequency * time) + 1);

You may also want to control the amplitude of the bulging, so it's useful to have a uniform parameter for that as well. Throwing that factor into the mix, here's what we get:

float displacement = scaleFactor * 0.5 *
(sin(frequency * time) + 1);


As it is now, this equation produces the same amount of protrusion all over the model. You might use it to show a character catching his breath after a long chase. To do this, you would apply the program to the character's chest. Alternatively, you could provide additional uniform parameters to indicate how rapidly the character is breathing, so that over time, the breathing could return to normal. These animation effects are inexpensive to implement in a game, and they help to immerse players in the game's universe.

Varying the Magnitude of Bulging

But what if you want the magnitude of bulging to vary at different locations on the model? To do this, you have to add a dependency on a per-vertex varying parameter. One idea might be to pass in scaleFactor as a varying parameter, rather than as a uniform parameter. Here we show you an even easier way to add some variation to the pulsing, based on the vertex position:

float displacement = scaleFactor * 0.5 *
sin(position.y * frequency * time) + 1;

This code uses the y coordinate of the position to vary the bulging, but you could use a combination of coordinates, if you prefer. It all depends on the type of effect you are after.

Updating the Vertex Position

In our example, the displacement scales the object-space surface normal. Then, by adding the result to the object-space vertex position, you get a displaced object-space vertex position:

float4 displacementDirection = float4(normal.x, normal.y,
normal.z, 0);

float4 newPosition = position +
displacement * displacementDirection


Precompute Uniform Parameters When Possible

The preceding example demonstrates an important point. Take another look at this line of code from Example 6-1:

float displacement = scaleFactor * 0.5 * sin(position.y * frequency * time) + 1;

If you were to use this equation for the displacement, all the terms would be the same for each vertex, because they all depend only on uniform parameters. This means that you would be computing this displacement on the GPU for each vertex, when in fact you could simply calculate the displacement on the CPU just once for the entire mesh and pass the displacement as a uniform parameter. However, when the vertex position is part of the displacement equation, the sine function must be evaluated for each vertex. And as you might expect, if the value of the displacement varies for every vertex like this, such a per-vertex computation can be performed far more efficiently on the GPU than on the CPU.

If a computed value is a constant value for an entire object, optimize your program by precomputing that value on a per-object basis with the CPU. Then pass the precomputed value to your Cg program as a uniform parameter. This approach is more efficient than recomputing the value for every fragment or vertex processed.

Particle Systems

Sometimes, instead of animating vertices in a mesh, you want to treat each vertex as a small object, or particle. A collection of particles that behave according to specific rules is known as a particle system. This example implements a simple particle system in a vertex program. For now, focus on how the system works; don't worry about its simplistic appearance. At the end of this section, we will mention one easy method to enhance your particle system's appearance. Figure 6-3 shows the particle system example progressing in time.

The example particle system behaves according to a simple vector kinematic equation from physics. The equation gives the x, y, and z positions of each particle for any time. The basic equation from which you will start is shown in Equation 6-1:

P final = P initial + vt + 1/2 at²
Equation 6-1. Particle Trajectory


  • P final is the particle's final position,
  • P initial is the particle's initial position,
  • v is the particle's initial velocity,
  • a is the particle's acceleration, and
  • t is the time taken.

The equation models the trajectory of a particle set in initial motion and under the influence of gravity, but not otherwise interacting with other particles. This equation gives the position of a particle for any value of time, assuming that you provide its initial position, initial velocity, and constant acceleration, such as gravity.


Figure 6-3. A Particle System

Initial Conditions

The application must supply the initial position and initial velocity of each particle as varying parameters. These two parameter values are known as initial conditions because they describe the particle at the beginning of the simulation.

In this particular simulation, the acceleration due to gravity is the same for every particle. Therefore, gravity is a uniform parameter.

To make the simulation more accurate, you could factor in effects such as drag and even spin-we leave that as an exercise for you.

Vectorized Computations

Modern GPUs have powerful vector-processing capabilities, particularly for addition and multiplication; they are well suited for processing vectors with up to four components. Therefore, it is often just as efficient to work with such vector quantities as it is to work with scalar (single-component) quantities.

Equation 6-1 is a vector equation because the initial position, initial velocity, constant acceleration, and computed position are all three-component vectors. By implementing the particle system equation as a vector expression when writing the Cg vertex program, you help the compiler translate your program to a form that executes efficiently on your GPU.

Vectorize your calculations whenever possible, to take full advantage of the GPU's powerful vector-processing capabilities.

The Particle System Parameters

Table 6-1 lists the variables used by the vertex program presented in the next section. Each variable is a parameter to the vertex program, except for the relative time (t) and final position (pFinal), which are calculated inside the vertex program. Note that the y component of the acceleration is negative-because gravity acts downward, in the negative y direction. The constant 9.8 meters per second squared is the acceleration of gravity on Earth. The initial position, initial velocity, and uniform acceleration are object-space vectors.

Variable Type Description Source (Type)
pInitial float3 Initial position Application (Varying)
vInitial float3 Initial velocity Application (Varying)
tInitial float3 Time at which particle was created Application (Varying)
acceleration float3 Acceleration (0.0, -9.8, 0.0) Application (Uniform)
globalTime float Global time Application (Uniform)
pFinal float3 Current position Internal
t float Relative time Internal

Table 6-1. Variables in the Particle Equation


void C6E2v_particle( float4 pInitial : POSITION,
float4 vInitial : TEXCOORD0,
float tInitial : TEXCOORD1,
  out float4 oPosition : POSITION,
out float4 color : TEXCOORD0,
out float pointSize : PSIZE,
  uniform float globalTime,
uniform float4 acceleration,
uniform float4x4 modelViewProj)
float t = globalTime - tInitial;
float4 pFinal = pInitial +
vInitial * t +
0.5 * acceleration * t * t;

oPosition = mul(modelViewProj, pFinal);

color = float4(t, t, t, 1);

pointSize = -8.0 * t * t +
8.0 * t +
0.1 * pFinal.y + 1;


Example 6-2. The C6E2v_particle Vertex Program

The Vertex Program

Example 6-2 shows the source code for the C6E2v_particle vertex program. This program is meant to work in conjunction with the C2E2f_passthrough fragment program.

Computing the Particle Positions

In this program, the application keeps track of a "global time" and passes it to the vertex program as the uniform parameter globalTime. The global time starts at zero when the application initializes and is continuously incremented. As each particle is created, the particle's time of creation is passed to the vertex program as the varying parameter tInitial. To find out how long a particle has been active, you simply have to subtract tInitial from globalTime:

float t = globalTime - tInitial;

Now you can plug t into Equation 6-1 to find the particle's current position:

float4 pFinal = pInitial +
vInitial * t +
0.5 * acceleration * t * t;

This position is in object space, so it needs to be transformed into clip space, as usual:

oPosition = mul(modelViewProj, pFinal);

Computing the Particle Color

In this example, time controls the particle color:

color = float4(t, t, t, 1);

This is a simple idea, but it produces an interesting visual variation. The color increases with time linearly. Note that colors saturate to pure white (1, 1, 1, 1). You can try your own alternatives, such as varying the color based on the particle's position, or varying the color based on a combination of position and time.

Computing the Particle Size

C6E2v_particle uses a new vertex program output semantic called PSIZE. When you render a point to the screen, an output parameter with this semantic specifies the width (and height) of the point in pixels. This gives your vertex program programmatic control of the point size used by the rasterizer.

The point size of each particle varies as time passes. The particles start out small, increase in size, and then gradually shrink. This variation adds to the fireworks-like effect. As an extra touch, we added a slight dependence on the particles' height, so that they get a little larger on their way up. To accomplish all this, we use the following function for the point size:

pointSize = -8.0 * t * t +
8.0 * t +
0.1 * pFinal.y + 1;



Figure 6-4. A Point Size Function

Figure 6-4 shows what the function looks like.

This function is nothing special-we merely created the formula to achieve the effect that we wanted. In other words, the formula does not have any real physical meaning, aside from attempting to mimic the effect we had in mind.

Dressing Up Your Particle System

Although the C6E2v_particle program produces interesting particle motion, the particles themselves do not look very appealing-they are just solid-colored squares of different sizes.

However, you can improve the particle appearance by using point sprites. With point sprites, the hardware takes each rendered point and, instead of drawing it as a single vertex, draws it as a square made up of four vertices, as shown in Figure 6-5. Point sprites are automatically assigned texture coordinates for each corner vertex. This allows you to alter the appearance of the particles from a square to any texture image you want.


Figure 6-5. Converting Points to Point Sprites

By rendering the points as point sprites, you can use the assigned texture coordinates to sample a texture that supplies the shape and appearance of each point vertex, instead of simply rendering each point vertex as a square point. Point sprites can create the impression of added geometric complexity without actually drawing extra triangles. Figure 6-6 shows a more visually interesting example of a particle system, using point sprites. Both OpenGL and Direct3D have standard interfaces for rendering point sprites.


Figure 6-6. A Particle System with Point Sprites

Key-Frame Interpolation

3D games often use a sequence of key frames to represent an animated human or creature in various poses. For example, a creature may have animation sequences for standing, running, kneeling, ducking, attacking, and dying. Artists call each particular pose that they create for a given 3D model a key frame.

Key-Framing Background

The term key frame comes from cartoon animation. To produce a cartoon, an artist first quickly sketches a rough sequence of frames for animating a character. Rather than draw every frame required for the final animation, the artist draws only the important, or "key," frames. Later, the artist goes back and fills in the missing frames. These in-between frames are then easier to draw, because the prior and subsequent key frames serve as before-and-after references.

Computer animators use a similar technique. A 3D artist makes a key frame for each pose of an animated character. Even a standing character may require a sequence of key frames that show the character shifting weight from one foot to the other. Every key frame for a model must use the exact same number of vertices, and every key frame must share the same vertex connectivity. A vertex used in a given key frame corresponds to the same point on the model in every other key frame of the model. The entire animation sequence maintains this correspondence. However, the position of a particular vertex may change from frame to frame, due to the model's animation.

Given such a key-framed model, a game animates the model by picking two key frames and then blending together each corresponding pair of vertex positions. The blend is a weighted average in which the sum of the weights equals 100 percent. Figure 6-7 shows an alien character with several key frames. The figure includes two key frames, marked A and B, to be blended into an intermediate pose by a Cg program.

An application can use a Cg vertex program to blend the two vertices together. This blending may include further operations to illuminate the blended character appropriately for more realism. Usually, an application specifies a single position for each vertex, but for key-frame blending, each vertex has two positions, which are combined with a uniform weighting factor.

Key-frame interpolation assumes that the number and order of vertices are the same in all the key frames for a given model. This assumption ensures that the vertex program is always blending the correct pairs of vertices. The following Cg code fragment blends key-frame positions:

blendedPosition = (1 - weight) * keyFrameA + weight * keyFrameB; 157

The keyFrameA and keyFrameB variables contain the (x, y, z) positions of the vertex being processed at key frames A and B, respectively. Note that weight and (1 - weight) sum to 1. If weight is 0.53, the Cg program adds 47 percent (1.0 - 0.53) of the position of key frame A to 53 percent of the position of key frame B. Figure 6-8 shows an example of this type of animation.


Figure 6-7. Key Frames for an Alien



Figure 6-8. An Example of Key-Frame Blending

To maintain the appearance of continuous animation, the key-frame weight increases with each rendered frame until it reaches 1.0 (100 percent). At this point, the existing key frame B becomes the new key frame A, the weight resets to 0, and the game selects a new key frame B to continue the animation. Animation sequences, such as walking, may loop repeatedly over the set of key frames that define the character's walking motion. The game can switch from a walking animation to a running animation just by changing over to the sequence of key frames that define the running animation.

It is up to the game engine to use the key frames to generate convincing animated motion. Many existing 3D games use this style of key-frame animation. When an application uses a Cg vertex program to perform the key-frame blending operations, the CPU can spend time improving the gameplay rather than continuously blending key frames. By using a Cg vertex program, the GPU takes over the task of key-frame blending.

Interpolation Approaches

There are many types of interpolation. Two common forms for key-frame interpolation are linear interpolation and quadratic interpolation.

Linear Interpolation

With linear interpolation, the transition between positions happens at a constant rate. Equation 6-2 shows the definition of linear interpolation:

blendedPosition = positionA x (1-f) + positionB x f

Equation 6-2. Linear Interpolation

As f varies from 0 to 1 in this equation, the intermediate position varies between positionA and positionB. When f is equal to 0, the intermediate position is exactly positionA, the starting position. When f is equal to 1, the intermediate position is positionB, the ending position. Once again, you can use Cg's lerp function to accomplish the interpolation.

Using lerp, the interpolation between two positions can be written concisely as:

intermediatePosition = lerp(positionA, positionB, f);

Quadratic Interpolation

Linear interpolation is good for many situations, but sometimes you want the rate of transition to change over time. For example, you might want the transition from positionA to positionB to start out slowly and get faster as time passes. For this, you might use quadratic interpolation, as in the following code fragment:

intermediatePosition = position1 * (1 - f * f) + position2 * f * f

Other functions that you might use are step functions, spline functions, and exponential functions. Figure 6-9 shows several common types of interpolation functions.

Basic Key-Frame Interpolation

Example 6-3 shows the C6E3v_keyFrame vertex program. This program performs the object-space blending of two positions, each from a different key frame. The lerp Standard Library function linearly interpolates the two positions, and then the program transforms the blended position into clip space. The program passes through a texture coordinate set and a color.

As indicated by the input semantics for positionA and positionB, the application is responsible for configuring key frame A's position as the conventional position (POSITION) and key frame B's position as texture coordinate set 1 (TEXCOORD1).

void C6E3v_keyFrame( float3 positionA : POSITION,
float3 positionB : TEXCOORD1,
float4 color : COLOR,
float2 texCoord : TEXCOORD0,

out float4 oPosition : POSITION,
out float2 oTexCoord : TEXCOORD0,
out float4 oColor : COLOR,

uniform float
uniform float4x4 modelViewProj)
float3 position = lerp(positionA, positionB,
oPosition = mul(modelViewProj, float4(position, 1));
oTexCoord = texCoord;
oColor = color;

Example 6-3. The C6E3v_keyFrame Vertex Program



Figure 6-9. Various Interpolation Functions

The application is also responsible for determining the key-frame blending factor via the uniform parameter keyFrameBlend. The value of keyFrameBlend should transition from 0 to 1. Once 1 is reached, the application chooses another key frame in the animation sequence, the old key frame B position input is then configured as the key frame A position input, and the new key-frame position data feeds the key frame B position input.

Key-Frame Interpolation with Lighting

You often want to light a key-framed model. This involves not merely blending two positions (the vertex in two different key frames), but also blending the two corresponding surface normals. Then you can calculate lighting computations with the blended normal. Blending two normals may change the length of the resulting normal, so you must normalize the blended normal prior to lighting.

Example 6-4 shows the C6E4v_litKeyFrame vertex program that adds per-vertex

struct Light {
float3 eyePosition; // In object space
float3 lightPosition; // In object space
float4 lightColor;
float specularExponent;
float ambient;

float4 computeLighting( float4 computeLighting(Light light,
float3 position, // In object space
float3 normal) // In object space

float3 lightDirection = light.lightPosition - position;
float3 lightDirNorm = normalize(lightDirection);
float3 eyeDirection = light.eyePosition - position;
float3 eyeDirNorm = normalize(eyeDirection);
float3 halfAngle = normalize(lightDirNorm + eyeDirNorm);
float diffuse = max(0, dot(lightDirNorm, normal));
float specular = pow(max(0, dot(halfAngle, normal)),
return light.lightColor * (light.ambient +
diffuse + specular);
void C6E4v_litKeyFrame(float3 positionA : POSITION,
float3 normalA : NORMAL,
float3 positionB : TEXCOORD1,
float3 normalB : TEXCOORD2,
float2 texCoord: TEXCOORD0,

out float4 oPosition : POSITION,
out float2 oTexCoord : TEXCOORD0,
out float4 color : COLOR,

uniform float keyFrameBlend,
uniform Light light,
uniform float4x4 modelViewProj)
float3 position = lerp(positionA, positionB,
float3 blendNormal = lerp(normalA, normalB,
float3 normal = normalize(blendNormal);
oPosition = mul(modelViewProj, float4(position, 1));
oTexCoord = texCoord;
color = computeLighting(light, position, normal);

Example 6-4. The C6E4v_litKeyFrame Vertex Program

lighting to the C6E3v_keyFrame example. In the updated example, each key frame also supplies its own corresponding per-vertex surface normal.

The computeLighting internal function computes a conventional lighting model using object-space lighting.

Vertex Skinning

Another approach to animating characters is vertex skinning. Many 3D modeling packages author 3D content suitable for vertex skinning. The technique is also known as matrix palette blending.

The Theory of Vertex Skinning

Rather than have key frames for each pose of a character, vertex skinning maintains a single default pose and a large set of matrices that appropriately rotate and translate various subregions of the default pose's polygonal mesh. For reasons that will become apparent, these various matrix transforms are often called "bones."

One or more of these matrices control each vertex in the default pose's polygonal mesh. Each matrix is assigned a weighting factor (from 0 to 100 percent), which indicates how much that matrix affects each vertex. Only a small number of matrices usually control each vertex, meaning that only these few matrices have positive and significant weighting factors for a given vertex. We call this small set of matrices the bone set for each vertex. We assume that the weighting factors for all the matrices in a vertex's bone set always sum to 100 percent.

When rendering this type of model, you first transform every vertex by each matrix in the vertex's bone set, then weight the results of each matrix transform according to the matrix's corresponding weighting factor, and finally sum the results. This new position is the skinned vertex position.

When all the matrices are identity matrices (no rotation, no translation), the mesh is in the default pose. 3D artists often pick a default pose in which the character is standing and facing forward, with legs apart and arms outstretched.

Constructing Poses from Matrices

By controlling the matrices, you can create novel poses. For example, a vertex on a character's forearm close to the elbow might use 67 percent of the forearm matrix, 21 percent of the elbow matrix, and 12 percent of the upper arm matrix. The animator who creates a model for vertex skinning must appropriately localize each matrix so that, for example, the matrix that controls the left shoulder has no effect on vertices near the ankle. Often, the number of matrices affecting any given vertex is limited to no more than four. For the 3D artist, once all the weights and matrices are assigned to the model's default pose, constructing a new pose is a matter of manipulating the matrices appropriately, rather than attempting to position each individual vertex. Posing and animating the model is much simpler when it is authored for vertex skinning.

For a character model, the most significant matrices represent the way rigid bones in the character's body move and rotate; hence, the vertex-skinning matrices are called bones. The vertices represent points on the skin. Vertex skinning simulates how bones, represented as matrices, tug and reposition various points, represented as vertices, on the character's skin.


For correct lighting, you can compute the same sort of transformed and weighted average used for positions, except that you transform normals by the inverse transpose of each matrix rather than by the matrix itself. Weighted normals may no longer be unit length, so normalization is required.

Assuming that the bone matrices are merely rotations and translations simplifies the transformation of the normals for lighting, because the inverse transpose of a matrix without scaling or projection is the matrix itself.

Storage Requirements Compared with Key Frames

With the key frame approach, every pose requires a distinct set of vertex positions and normals. This becomes unwieldy if huge numbers of poses are required.

However, with vertex skinning, each pose requires just the default pose-shared by all poses-and the matrix values for the given pose. There are generally substantially fewer matrices per character than vertices, so representing a pose as a set of bone matrices is more compact than representing the pose with a key frame. With vertex skinning, you can also create novel poses dynamically, either by blending existing bone matrices from different poses or by controlling matrices directly. For example, if you know what matrices control an arm, you can wave the arm by controlling those matrices.

In addition to requiring the matrices for each pose, the model's default pose needs each vertex to have a default position, a default normal, some number of matrix indices to identify which subset of matrices control the vertex, and the same number of weighting factors, corresponding to each respective matrix.

This data for the default pose is constant for all other poses. Generating a new pose requires only new matrices, not any changes to the default pose data. If the GPU can perform all the vertex-skinning computations, this means that the CPU needs to update only the bone matrices for each new pose, but not otherwise manipulate or access the default pose data.

Vertex skinning is quite amenable to storing and replaying motion-capture sequences. You can represent each motion-capture frame as a set of bone matrices that you can then apply to different models that share the same default pose and matrix associations. Inverse kinematics solvers can also generate bone matrices procedurally. An inverse kinematics solver attempts to find an incremental sequence of bone matrices that transition from one given pose to another given pose in a realistic, natural manner.

Vertex Skinning in a Vertex Program

The C6E5v_skin4m vertex program in Example 6-5 implements vertex skinning, assuming that no more than four bone matrices affect each vertex (a common assumption).

An array of 24 bone matrices, each a 3x4 matrix, represents each pose. The entire array is a uniform parameter to the program. The program assumes that each bone matrix consists of a translation and a rotation (no scaling or projection).

The per-vertex matrixIndex input vector provides a set of four bone-matrix indices for accessing the boneMatrix array. The per-vertex weight input vector provides the four weighting factors for each respective bone matrix. The program assumes that the weighting factors for each vertex sum to 100 percent.

For performance reasons, the program treats boneMatrix as an array of float4 vectors rather than an array of float3x4 matrices. The matrixIndex array contains floating-point values instead of integers, and so the addressing of a single array of vectors is more efficient than accessing an array of matrices. The implication of this is that the indices in the matrixIndex vector should be three times the actual matrix index. So, the program assumes 0 is the first matrix in the array, 3 is the second matrix, and so on. The indices are fixed for each vertex, so you improve performance by moving this "multiply by 3" outside the vertex program.

A for loop, looping four times, transforms the default pose position and normal by each bone matrix. Each result is weighted and summed.

The program computes both the weighted position and normal for the pose. The same computeLighting internal function from Example 6-4 computes per-vertex object-space lighting with the weighted position and normal.

Although this example is rather limited, you could generalize it to handle more bone matrices, general bone matrices (for example, allowing scaling), and matrices influencing each vertex-and to compute a better lighting model.

void C6E5v_skin4m(float3 position : POSITION,
float3 normal : NORMAL,
float2 texCoord : TEXCOORD0,
float4 weight : TEXCOORD1,
float4 matrixIndex : TEXCOORD2,

out float4 oPosition : POSITION,
out float2 oTexCoord : TEXCOORD0,
out float4 color : COLOR,

uniform Light light,
uniform float4 boneMatrix[72], // 24 matrices
uniform float4x4 modelViewProj)
float3 netPosition = 0, netNormal = 0;

for (int i = 0; i < 4; i++) {
float index = matrixIndex[i];
float3x4 model = float3x4(boneMatrix[index + 0],
boneMatrix[index + 1],
boneMatrix[index + 2]);
float3 bonePosition = mul(model, float4(position, 1));
// Assume no scaling in matrix, just rotate & translate
float3x3 rotate = float3x3(model[0].xyz,
float3 boneNormal = mul(rotate, normal);
netPosition += weight[i] * bonePosition;
netNormal += weight[i] * boneNormal;
netNormal = normalize(netNormal);

oPosition = mul(modelViewProj, float4(netPosition, 1));
oTexCoord = texCoord;
color = computeLighting(light, netPosition, netNormal);

Example 6-5. The C6E5v_skin4m Vertex Program

Further Reading

Cg builds on a host of concepts in computer language design, computer hardware design, and computer graphics. Doing justice to all these contributions in the context of this tutorial is not always practical. What we attempt in the "Further Reading" section is to offer you pointers to learn more about the contributions that underlie the topics in each chapter.

There are plenty of books on C. The C Programming Language, Third Edition (Prentice Hall, 2000), by Brian Kernighan and Dennis Ritchie, is a classic; the authors invented the C language. Cg includes concepts from both C and C++. There now may actually be more books about C++ than about C. The classic C++ book is The C++ Programming Language, Third Edition (Addison-Wesley, 2000), by Bjarne Stroustrup, who invented the language.

To learn more about the RenderMan Shading Language, read The RenderMan Companion: A Programmer's Guide to Realistic Computer Graphics (Addison-Wesley, 1989), by Steve Upstill. Pat Hanrahan and Jim Lawson published a SIGGRAPH paper about RenderMan called "A Language for Shading and Lighting Calculations" (ACM Press) in 1990.

Robert Cook's 1984 SIGGRAPH paper titled "Shade Trees" (ACM Press) motivated the development of RenderMan.

The development of programmable graphics hardware and its associated languages has been an active and fruitful research area for almost a decade. Anselmo Lastra, Steven Molnar, Marc Olano, and Yulan Wang at UNC published an early research paper in 1995 titled "Real-Time Programmable Shading" (ACM Press). Researchers at UNC also published several papers about their programmable PixelFlow graphics architecture. Marc Olano and Anselmo Lastra published a SIGGRAPH paper titled "A Shading Language on Graphics Hardware: The PixelFlow Shading System" (ACM Press) in 1998.

Kekoa Proudfoot, Bill Mark, Svetoslav Tzvetkov, and Pat Hanrahan published a SIGGRAPH paper in 2001 titled "A Real-Time Procedural Shading System for Programmable Graphics Hardware" (ACM Press) that describes a GPU-oriented shading language developed at Stanford.

Real-Time Rendering, Second Edition (A. K. Peters, 2002), written by Eric Haines and Tomas Akenine-Möller, is an excellent resource for further information about graphics hardware and interactive techniques.

The OpenGL Graphics System: A Specification documents the OpenGL 3D programming interface. The best tutorial for learning OpenGL programming is the OpenGL Programming Guide: The Official Guide to Learning OpenGL, Third Edition (Addison- Wesley, 1999), by Mason Woo, Jackie Neider, Tom Davis, and Dave Shreiner. The website serves up much more information about OpenGL.

Documentation for the Direct3D programming interface is available from Microsoft's site. NVIDIA provides further information about the Cg runtime, CgFX, and Cg itself on its Developer website at

If you are interested in the physics behind the particle system you created, you can learn more by reviewing kinematics in any high school or college physics textbook.

Jeff Lander wrote a series of articles in 1998 and 1999 for Game Developer Magazine about various animation techniques. You can find these articles on the website. For particle systems, read "The Ocean Spray in Your Face." For vertex skinning, check out "Skin Them Bones: Game Programming for the Web Generation."

The original volume of Game Programming Gems (Charles River Media, 2000), edited by Mark DeLoura, contains several gems related to key-frame animation and vertex skinning. Check out these articles: "Interpolated 3D Keyframe Animation," by Herbert Marselas; "A Fast and Simple Skinning Technique," by Torgeir Hagland; and "Filling the Gaps-Advanced Animation Using Stitching and Skinning," by Ryan Woodland.

John Vince's book 3-D Computer Animation (Addison-Wesley, 1992) covers many of the techniques described in this chapter, as well as others, such as free-form deformation (FFD).

DirectX 8 added point sprites to Direct3D. OpenGL implementations from multiple hardware vendors support the NV_point_sprite extension. The specification for this OpenGL extension is available at the website.

Latest Jobs


Playa Vista, California
Audio Engineer

Digital Extremes

London, Ontario, Canada
Communications Director

High Moon Studios

Carlsbad, California
Senior Producer

Build a Rocket Boy Games

Edinburgh, Scotland
Lead UI Programmer
More Jobs   


Register for a
Subscribe to
Follow us

Game Developer Account

Game Developer Newsletter


Register for a

Game Developer Account

Gain full access to resources (events, white paper, webinars, reports, etc)
Single sign-on to all Informa products

Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more