Sponsored By

Exploring Sh: A New GPU Metaprogramming Toolkit

Sh is a free, open-source system developed at the University of Waterloo for programming graphics processors (GPUs) in C++. This article explores the nuts and bolts of Sh and its applications.

Michael McCool, Blogger

July 16, 2004

23 Min Read

Sh is a free, open-source system developed at the University of Waterloo that lets you program graphics processors (GPUs) in C++. Basically, it consists of a C++ library that supports run-time specification and compilation of GPU shader programs. This library uses operator overloading to build a clean, high-level API, so that defining Sh programs is as straightforward as defining functions in C++, and as expressive as writing shaders in a specialized shading language. In addition, Sh integrates with the scope rules of C++ in such a way that all the capabilities of C++ can be used to manipulate and modularize GPU code, including classes, templates, functions, and user-defined types. No additional glue code is required to bind Sh programs to the host application: they act like an extension of it. For instance, shader parameters and textures can just be declared as variables, then used inside shader definitions, and Sh will do the rest. Sh can be used as a shading language, for complex multipass rendering algorithms, or to implement general-purpose stream computations (such as simulation).

Why should you, as a game developer, be interested in Sh? First of all, Sh is a much more powerful, modular, and complete programming system than other available real-time shading languages. It's more than a shading language: it also tracks textures and shader parameters, and the associations between these and shaders. Using the object-oriented features of C++, you can share code between shaders, and encapsulate complex algorithms and data representations so that they can be more easily be reused. You can also easily use Sh to build custom compilers to convert your own data into shaders (metaprogramming). In general, you can write more sophisticated shaders far more efficiently in Sh than in other systems. Second, Sh can improve your productivity by eliminating a lot of the annoyances and glue code requirements of other shading languages. Third, you can use Sh to accelerate other game engine computations, such as simulations and AI. Since Sh compiles to both the GPU and the CPU, writing these components in Sh does not commit you to running them on the GPU. You can even defer that decision to runtime. For instance, you could profile the GPU and CPU at install time and decide to run simulation components on whichever processor leads to a load-balanced system. In general, Sh makes all the computational capabilities of a system available to you with a common interface.

In the future, we hope that Sh's ability to encapsulate data representations and algorithms will lead to a large set of implementations of advanced algorithms being made available as Sh classes and functions by researchers. Sh is also a useful platform for shader compiler research since it is completely open source. Finally, we plan to extend Sh to a number of other compilation targets, including parallel machines and game platforms. Sh's conceptual model is platform, vendor, and API independent. Ideally, this will ease porting between different platforms and allow greater reuse of code.

The Sh Architecture

Let's begin by looking at how Sh is structured. The library is built around a set of classes, such as ShPoint3f, ShVector3f, or ShMatrix4x4f, that can be used directly as a graphics utility library. A number of useful operators and functions are defined that act on objects of these classes. You can add or subtract vectors, take dot or cross products, do matrix/vector and matrix/point transformations, normalize vectors to unit length, and so forth. Sh also supports swizzling (extraction and rearrangement of elements of a tuple or matrix) and writemasking (assignment to only some elements of a tuple or matrix).

You can specify operations on Sh objects in two modes. In immediate mode, which is the default, operations take place as soon as they are specified. In retained mode, rather than executing a sequence of operations, Sh records them in a program object. Retained mode is indicated by wrapping a section of code in the keywords SH_BEGIN_PROGRAM and SH_END. Recorded operation sequences can then be compiled for a specified target (usually the GPU, although Sh can also dynamically generate code for the host CPU). Program objects can be loaded into the vertex and fragment shader units of GPUs, in which case they affect rendering with standard graphics APIs. Alternatively, they can be used directly as stream functions for general-purpose computation, without any need to invoke a graphics API.

In addition to supporting the dynamic generation of code, Sh also manages textures and streams. Textures act like arrays, and like other parameters are bound to Sh programs using the scope rules of C++. This means that data abstractions can be built around textures. For instance, suppose you want to build a special compressed texture type that is decompressed by a particular sequence of shader code. You can declare a class that encapsulates a built-in texture class to store the compressed data, but redefines the access operators to insert the necessary code into the calling shader. If your new class supports the same interface as one of the built-in textures, it can be used anywhere they can be used.

Streams are used to support a general-purpose computational model on GPUs. A stream program is like a shader: it is a function that maps a certain number of inputs to a certain number of outputs. Stream objects in Sh are similar to textures. They refer to a sequence of data in memory that can be acted upon or generated by stream programs. Stream programs can be applied to streams with a simple operator or function call syntax. Streams can also be decomposed into or constructed from individual channels of data. A sophisticated stream syntax is provided that supports many advanced features, such as shared substreams, conversion of parameters to inputs and the reverse, program composition, and currying.

Example: Blinn-Phong Shader

The simplest way to introduce Sh is with some examples. The following code defines a Blinn-Phong shader for a single point source (the shader equivalent of "Hello World") by defining a vertex shader and a fragment shader. This shader will also transform vertices into view space for lighting and into device space for rendering. A rendering produced with this shader is given in Figure 1.


figure01.jpg

Figure 1: Blinn-Phong lighting model, simple and with texture maps.

First we will define a number of global variables giving the transformation matrices and the parameters of the lighting model:


ShMatrix4x4f modelview; // MCS to VCS transformation
ShMatrix4x4f perspective; // VCS to DCS transformation

ShColor3f phong_kd; // diffuse color
ShColor3f phong_ks; // specular color
ShAttrib1f phong_spec_exp; // specular exponent
ShPoint3f phong_light_position; // VCS light position
ShColor3f phong_light_color; // light source color

ShProgram phong_vert, phong_frag;

 


We will build the vertex and fragment shaders themselves in an initialization function as follows:


void phong_init () {
// Create vertex shader
phong_vert = SH_BEGIN_PROGRAM("gpu:vertex") {
// Declare shader inputs
ShInputNormal3f nm; // normal vector (MCS)
ShInputPosition3f pm; // position (MCS)

// Declare shader outputs
ShOutputNormal3f nv; // normal (VCS)
ShOutputVector3f lv; // light-vector (VCS)
ShOutputVector3f vv; // view vector (VCS)
ShOutputColor3f ec; // irradiance
ShOutputPosition4f pd; // position (HDCS)

// Specify shader computations
ShPoint3f pv = (modelview | pm)(0,1,2);
vv = normalize(-pv);
lv = normalize(phong_light_position - pv);
nv = normalize(modelview | nm);
ec = phong_light_color * pos(nv|lv);
pd = perspective | pv;
} SH_END; // End of vertex shader

// Create fragment shader
phong_frag = SH_BEGIN_PROGRAM("gpu:fragment") {
// Declare shader inputs
ShInputNormal3f nv; // normal (VCS)
ShInputVector3f lv; // light-vector (VCS)
ShInputVector3f vv; // view vector (VCS)
ShInputColor3f ec; // irradiance

// Declare shader outputs
ShOutputColor3f fc; // fragment color

// Specify shader computations
vv = normalize(vv);
lv = normalize(lv);
nv = normalize(nv);
ShVector3f hv = normalize(lv + vv);
fc = phong_kd * ec;
fc += phong_ks * pow(pos(hv|nv), phong_spec_exp);
} SH_END; // End of fragment shader
} // End of phong_init

 


We have wrapped Sh shader program definitions in the SH_BEGIN_PROGRAM and SH_END keywords. The SH_BEGIN_PROGRAM returns a program object that will represent the recorded sequence of operations. Inputs and outputs to the program objects are indicated by appropriate Input and Output prefixes on instances of Sh types. The "|" operator is used for dot product and matrix multiplication, although you can also use a dot function for the former and "*" for the latter.

Once defined, the program objects phong_vert and phong_frag can be loaded into the vertex and fragment shading units of the GPU using the shBind API call. You can now use a normal graphics API to specify geometry, and the shaders will be applied to that geometry. Right now, Sh only supports OpenGL, although we are working on a DirectX binding and it should be available soon. In your graphics API, you need to set up the correct vertex attributes for the shaders you have loaded, and fragment and vertex shader pairs need to be consistent in their inputs and outputs. A set of rules based on type and order of declaration defines how shader inputs map onto vertex attributes. You can also ask program objects for a human-readable string describing the interface binding.

The "uniform" parameters of these shaders, that is, the values that are the same for all shaded vertices or fragments such as phong_kd, are simply referenced directly by the shader definitions. No additional glue code is required to set up these parameters, and a simple assignment (outside of a shader definition) is all that is needed to modify one. Which parameters get bound to each shader program is controlled by the scope rules of C++. For instance, we could have made the parameters data members of a class and defined the shader program objects in a member function. Then the member function creating the shader programs would have picked up the data members and an encapsulated shader would have been created. In general, Sh is designed to integrate with C++ cleanly, and most C++ modularity constructs can be used to with Sh programs. Many other programming techniques are enabled by this integration, and by the fact that C++ can manipulate Sh programs in arbitrary ways at runtime.

If we wanted to texture map this shader, instead of ShColor3f for phong_kd we could have used ShTexture2D<ShColor3f>. Then we would have to modify the shader definitions to pass in a texture coordinate, and then index the texture object. The bindings of textures work in exactly the same way as uniform parameters, so as with parameters, we can create data abstractions using the object-oriented features of C++.

The following code example encapsulates parameters as data members in a class, uses template arguments and construction-time arguments to parameterize the shader, uses a template class to coordinate the vertex shader outputs and fragment shader inputs (incidentally, also demonstrating the more generic, template-based mechanism for declaring Sh types, which can also be used to declare tuples of arbitrary length), and finally uses C++ control constructs to manipulate shader code: in this case, by unrolling a loop to support multiple light sources, using C++ arrays to hold multiple light source properties. We can also use ordinary C++ functions to implement functions in shader code. This is roughly how the standard library functions such as normalize (and, in fact, the operators) are implemented. A rendering produced with this shader is also given in Figure 1.


template
class BlinnPhong {
public:
// Declare parameters and textures as data members
ShTexture2D<ShColor3f> kd;
ShTexture2D<ShColor3f> ks;
ShAttrib1f spec_exp;
ShPoint3f light_position[NLIGHTS];
ShColor3f light_color[NLIGHTS];

// Declare I/O type to coordinate vertex and fragment shaders
template <ShBindingType IO> struct VertFrag {
ShPoint<4,IO,float> pv; // position (VCS)
ShTexCoord<2,IO,float> u; // texture coordinate
ShNormal<3,IO,float> nv; // normal (VCS)
ShColor<3,IO,float> ec; // total irradiance
};

// Declare program objects for shaders
ShProgram vert, frag;

// Constructor: parameterized by texture resolution
BlinnPhong (int res) : kd(res,res), ks(res,res) {

// Create vertex shader
vert = SH_BEGIN_PROGRAM("gpu:vertex") {
// Declare shader inputs
ShInputNormal3f nm; // normal vector (MCS)
ShInputTexCoord2f u; // texture coordinate
ShInputPosition3f pm; // position (MCS)

// Declare shader outputs
VertFrag<SH_OUTPUT> vf;
ShOutputPosition4f pd; // position (HDCS)

// Specify shader computations
vf.pv = modelview | pm;
vf.u = u;
vf.nv = normalize(modelview | nm);
pd = perspective | vf.pv;
for (int i=0; i
ShVector3f lv =
normalize(light_position[i] - vf.pv(0,1,2));
vf.ec += light_color[i] * pos(vf.nv|lv);
}
} SH_END; // End of vertex shader

// Create fragment shader
frag = SH_BEGIN_PROGRAM("gpu:fragment") {
// Declare shader inputs
VertFrag<SH_INPUT> vf;

// Declare shader outputs
ShOutputColor3f fc; // fragment color

// Specify shader computations
ShVector3f vv = normalize(-vf.pv(0,1,2));
ShNormal3f nv = normalize(vf.nv);
fc = kd(vf.u) * vf.ec;
ShColor3f kst = ks(vf.u);
for (int i=0; i
ShVector3f lv =
normalize(light_position[i] - vf.pv(0,1,2));
ShVector3f hv = normalize(lv + vv);
fc += kst * pow(pos(hv|nv),spec_exp)
* light_color[i];
}
} SH_END; // End of fragment shader
} // End of constructor
}; // End of BlinnPhong class

 


In the fragment shader, notice that a texture read is indicated with the "()" operator on a texture object, as in kd(vf.u) and ks(vf.u). This operator treats the texture as a tabulated function with a normalized texture coordinate range of 0 to 1 in each coordinate. Sh also supports the "[]" operator for texture lookups, which works the same but places a texel at each integer. The "[]" lookup operator is useful when textures are being used as arrays to hold data structures (for instance, a ray-tracer accelerator). Sh also supports several additional texture types for rectangular textures, 1D and 3D textures, and cube textures.

Sh programs can write to inputs and read from outputs. Writing to an input does not change the original data; Sh inputs are pass by value. When mapping to a backend that does not support these operations, Sh will introduce an additional temporary automatically. Temporaries (including automatically introduced temporaries) are also always initialized to zero if their value is used before they are assigned to. These transformations are included as conveniences to simplify the input code. For instance, the ability to use += on a zero-initialized output is very useful if you want to accumulate several sources of light in an output color but don't want to keep track of which source is "first".

Several additional shader examples are given in Figure 2. Many of these examples use the Perlin and Worley noise functions built into Sh. Wood, for example, adds some Perlin noise to a quadratic function and then feeds the result through a periodic sawtooth function stored in a texture map. The Worley noise functions are based on the kth nearest distance to a set of procedurally generated feature points (we use a jittered grid). The library function computing the Worley noise functions can be parameterized with different distance functions. The giraffe shader shown here uses a Manhattan distance metric with the Worley function and passes the distance to the closest feature point through a threshold function.


figure02.jpg

Figure 2: Some more example shaders. Many of these examples use the Perlin and Worley noise functions built into Sh.

______________________________________________________

Stream Functions

Stream functions are generalizations of shaders compiled with gpu:stream or cpu:stream as a target. Streams and channels are containers for sequences of data that can be acted upon by stream functions. Channels hold a sequence of data from a basic tuple data type while streams represent particular combinations of channels. Special operators are used for stream function application and construction of streams from channels: "<<" for application and "&" for stream construction. For instance, suppose we have a stream function f that takes three input channels (a point, a normal, and a color) and two output channels (a color and a scalar). We could declare some channels as follows:

ShChannel<ShPoint3f> p;
ShChannel<ShNormal3f> n;
ShChannel<ShColor3f> c1, c2;
ShChannel<ShAttrib1f> d;

We can then create some streams as follows:

ShStream input_s = (p & n & c1);
ShStream output_s = (c2 & d);

After initializing the channels with data, we can apply the stream function f to generate data for the output streams:

output_s = f << input_s;

We don't have to create the intermediate streams if we don't want to, we can just use the "&" operator inline:

(c2 & d) = f << (p & n & c1);

The "<<" operator can also be used to apply programs to single tuples, which then act like a channel all of whose elements have that value. Partial evaluation (currying) is also supported, so you don't have to provide the inputs to a program all at once, you can give them one at a time. The following is equivalent to the above expression:

ShProgram g = f << p;
(c2 & d) = g << (n & cl);

For partial evaluation, to avoid data copies, the channel p is read only in the second line of the above example, when the stream function actually executes. This "deferred read" is also used when an input is a tuple rather than a stream channel. This is actually an interesting feature: "<<" can be used to convert a "varying" input attribute to a "uniform" parameter! Since that's a useful operation, we also define its inverse, specified with the ">>" operator, that converts a parameter to an input. You give the program on the left and an existing parameter that program uses on the right. The result is a program in which that parameter dependence has been removed and with an additional input. This actually leads to some interesting programming techniques. For instance, if you have a value (like a texture coordinate) that is used in a lot of places, rather than passing it around everywhere you can just declare a global parameter, then convert it into an input later.

The "<<" and "&" operators can also be applied directly to program objects to create other program objects. On program objects, "<<" feeds the outputs of one program object into the inputs of another, creating a new program object. In other words, it performs functional composition. The "&" operator concatenates all the inputs, outputs, and operations in two program objects, creating a new program object. Because of the way Sh syntax works, this is equivalent to concatenating the source code of the two program objects (in two separate name scopes). Eventually, after applying any number of such operations, the resulting composite program object is fed through the complete compilation chain, including optimization, which reduces it to a single optimized implementation.

These operators turn out to be incredibly useful, especially when you combine preexisting program objects with small glue programs. For instance, using "<<" you can reorganize the inputs and outputs of program objects, delete outputs (an operation called program specialization; the optimizer will delete any unnecessary computations), or replace inputs with texture accesses. The Sh library includes a number of generators for common glue program patterns like these.

An application of stream processing is shown in Figure 3. This figure shows the result of a particle system simulation running on the GPU using a stream program to update the state of each particle and a shader to convert the updated particle state to a visual rendering. Code for the state update part of this application is given in our recent SIGGRAPH paper [3]. That paper also includes an example that converts a Phong lighting model to a wood shader using the program manipulation (shader algebra) operators.


figure03.jpg

Figure 3: Particle system simulation implemented with stream processing.

Future Work

Sh is a work in progress. It is well past the research prototype phase and can now certainly be used for the robust and flexible specification of shaders. However, we are in the midst of adding some additional features to better support stream computing and large commercial game development projects (including integration with asset management systems, file externalization, and additional backends). Formally, Sh is still in alpha but is in final testing before a beta release planned for August 2004. This beta status is not meant to indicate that Sh is unstable, only that it has not yet reached its final feature set. We are particularly interested in hearing feedback from the game community to identify any important missing features that would block the adoption of Sh in a commercial game project. We are committed to addressing any such issues by the end of the year.

Sh is distinguished from other shading languages both by its close integration with a C++ host application and by its direct support for stream computing. Both of these attributes are aimed at applications in which a combination of CPU control and GPU computation are necessary to implement a complex algorithm. However, additional stream operations not yet directly supported by Sh, such as reduction and indexing, are potentially useful to implement some stream processing algorithms. Unfortunately, efficient implementation of some of these multipass operations on GPUs requires driver support for flexible buffer and memory management on the graphics accelerator, an area which is still in state of flux. Assuming the driver issues get sorted out, these additional features should also be available soon.

We are also looking at targeting additional backends. Currently, Sh supports ATI and Nvidia floating-point GPUs and the host CPU (stream functions can be dynamically compiled to CPU code). Game platforms and parallel distributed-memory clusters via MPI are also interesting targets that we are working on. The semantics of Sh have been kept intentionally simple to make an efficient mapping onto such platforms possible, but these machines have significantly different architectures than GPUs and so additional development work will be necessary.

Our ultimate goal is to provide a unified computational model for all platforms that supports both stream computing and shaders, and then to build a library of useful algorithms on top of that. We plan to keep as much of this development as possible in an open source form, except for components (such as game platform backends) that might require licensing fees to develop.

Further Reading

AK Peters will publish a book [2] on the Sh system in August 2004, called Metaprogramming GPUs with Sh. This book includes a detailed user tutorial, a reference manual, and a guide to the internals of the open-source distribution available from the Sh SourceForge web site. The website contains additional information, such as links to sample shaders, research papers [3,4], and videos.

The creators of Sh will be running a programming contest. Four high-end video cards, two from ATI and two from NVIDIA, will be offered as prizes. The winning entries of this contest will be the programs that best exploit the novel capabilities of Sh, and will not be limited to shaders. Details are available on the Sh web site [1].

References

1. Sh Web Site, http://libsh.org
2. Michael McCool and Stefanus Du Toit, Metaprogramming GPUs with Sh, AK Peters, 2004, http://www.akpeters.com
3. Michael McCool, Stefanus Du Toit, Tiberiu Popa, Bryan Chan and Kevin Moule, Shader Algebra, ACM Transactions on Graphics (Proceedings of SIGGRAPH), August 2004.
4. Michael McCool, Zheng Qin, and Tiberiu Popa, Shader Metaprogramming, Proceedings of SIGGRAPH/Eurographics Conference on Graphics Hardware, September 2002, pp. 57-68.

______________________________________________________

Read more about:

Features

About the Author(s)

Michael McCool

Blogger

Michael McCool graduated in 1989 from the University of Waterloo with a B.A.Sc. in Computer Engineering (with a Mathematics Option) and received the Sir Sandford Fleming Medal for Academic Achievement. As an undergraduate he worked in the VLSI Group at the University of Waterloo on silicon compilers and at ISG Technologies on medical visualization and radiologist assistance systems. He completed his Ph.D. in 1994 with the Dynamic Graphics Project at the Department of Computer Science, University of Toronto, where he worked with video, antialiasing, and polyhedral spline mathematics. Michael is currently an Associate Professor in the Computer Graphics Lab at the University of Waterloo. He has published papers and articles in SIGGRAPH, Graphics Hardware, ACM Transactions on Graphics, Graphics Interface, the journal of graphics tools, the Eurographics Symposium on Rendering, and Game Developer Magazine. Research interests include high-quality real-time rendering, global and local illumination, shaders, general-purpose GPU programming, parallel computing, interval and Monte Carlo methods and applications, end-user programming and metaprogramming, and image and signal processing.

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like