For a few years now, many of us have been using Microsoft's OpenGL under Windows 95 and Windows NT. Sadly, the performance of this implementation has occasionally lagged Direct3D and worse yet, hardware drivers for Windows 95 have been absent or delayed due to confusion about Microsoft's delivery of mini client drivers (MCDs) for Windows 95. The MCD is Microsoft's solution for Windows NT drivers and provides an easy to use (albeit low performance) solution for device driver development. At the 1997 CGDC, it became clear that plans for a Windows 95 MCD architecture were indefinitely delayed. This was disappointing to many game developers, IHV already developing MCDs and, I understand, to Silicon Graphics (SGI) as well.
This article will provide an update on the current state of OpenGL, with specific emphasis on the SGI's OpenGL. I will also examine the new SGI OpenGL DDK, the resulting hardware driver development process, and discuss how it impacts the performance of your application code.
Beginning in the Spring of 1995, SGI began an independent development process to produce their own OpenGL implementation for Windows. A key SGI Windows product, Cosmo Player, was suffering from the slow performance, making VRML viewing somewhat sluggish. Because of this, SGI's original Windows implementation of OpenGL was called "Cosmo OpenGL" in the hopes that it would significantly improve the performance of the Cosmo Player. SGI also wanted OpenGL to be a viable API under Windows, which motivated them to provide a high performance sample implementation for licensees. Tremendous energy went into developing this optimized version of OpenGL, including run-time code generation (including massive amounts of assembly code), proposals to the OpenGL Architectural Review Board (ARB) for high-performance API extensions, and the inclusion of tricks previously reserved for high-end SGI workstations.
Over time, as the code was being developed, increasing interest in the games market developed within SGI. This interest was amplified by Chris Hecker's open letter to Microsoft (see the April/May and June 1997 issues of Game Developer for more information ), public commentary regarding OpenGL and the ensuing Direct3D vs. OpenGL debate.
In February of 1997, SGI released a software-only renderer, the beta version of Cosmo OpenGL. This software-only renderer used an optimized rasterizer (more about this later) and outperformed Microsoft's OpenGL for triangle rendering in common rendering contexts. In July, SGI released MR1, which included MMX acceleration.
Understanding that software performance is never enough, in October SGI released a Windows 95/NT device driver kit (DDK) for independent hardware vendors (IHVs) that let 3D hardware vendors quickly develop OpenGL drivers for their accelerators. This DDK was designed to be easy to use, readily available at little or no cost, and provided all the code necessary to produce a high-performance OpenGL driver. Most major graphics card vendors are expected to begin providing OpenGL drivers for Windows 95 now that this DDK is available. Currently SGI is developing a DirectDraw-compatible OpenGL, which will allow game developers not only to access DirectDraw simultaneously with OpenGL, but will also make testing transition code from Direct3D easier.
Microsoft uses an OPENGL32.DLL file for run-time execution. In a somewhat confusing move, SGI delivers an OPENGL.DLL file for the same purpose. In order to use SGI's OpenGL for Windows, a game developer must explicitly link with SGI's libraries and use its header files. There are two reasons for this: the SGI OpenGL contains a set of performance extensions which don't exist in Microsoft's implementation, and the executable header contains the name of the .DLL required at run time.
A little trick comes out of this arrangement. Sometimes it is helpful to test the SGI OpenGL without relinking. This is especially beneficial when you do not have access to linkable code. Frankly, it is also helpful when you just don't want to take the time to relink when performing operations like benchmarking. The OPENGL.DLL may be simply renamed OPENGL32.DLL and used with an application linked with the Microsoft OpenGL libraries. If the game uses any SGI-specific extensions, the code should gracefully handle a fallback method if the SGI .DLL is not detected by the application.
From this observation, it becomes apparent that SGI could have just installed OpenGL for Windows right on top of Microsoft's OpenGL. However, SGI took a more Microsoft-friendly approach. The other benefit to this approach is that the game developer can be assured that the SGI-specific extensions are available when OPENGL.DLL is available. As an aside, your game can determine which OpenGL it is rendering through by using multiple calls to glGetString with the arguments GL_VENDOR, GL_RENDERER and GL_VERSION after you successfully bind a context. If you check these values before context creation, you may get incorrect results back.
ICD - Installable Client Driver, contains the entire rendering pipeline of OpenGL. This solution, while providing the highest possible performance, was also daunting to IHVs. The Microsoft ICD kit required considerable effort to turn into a driver. SGI's DDK is also an ICD, but comes with a sample driver (for Virge/GX) as an example.
Co-Residence of OpenGL Implementations
The Microsoft and SGI OpenGL implementations may co-reside on a single PC. Which OpenGL rendering pipeline is used depends on how the application was linked and what (if any) 3D hardware is installed.
Let's briefly examine the OpenGL rendering pipeline architecture, which you can see in Figure 1 (below). OpenGL may be present in up to three different rasterizing configurations: software only, installable client driver (ICD) and MCD. Further, two vendors (Microsoft and SGI) now offer ICD kits for IHVs as well as software-only rasterizers. As a result, your rendering may traverse up to five different paths (or up to four paths on any given machine due to MCD under Windows NT). For Windows 95, we need concern ourselves with three common paths: Microsoft software-only rendering, SGI software-only rendering, and ICD. IHVs may choose to offer alternate paths, such as an ICD-less path direct to hardware (path D in Figure 1), but I doubt it will be commonplace.
Figure 1. Various OpenGL Rasterization Paths. (*Note that the window and context management contained in the wgl functions are independent of the rendering pipeline and remain so, regardless of the specific configuration.)
The software rasterizer is present in all configurations and is shown as path A in Figure 1. This is required, even when 3D hardware is present, in order to handle pixel formats and states which are not supported by the 3D hardware. For example, the 3D hardware may not support per-pixel fog. In this case, rendering would revert to software rasterization, even though 3D hardware is present.
The software rasterizer in OpenGL for Windows has been aggressively tuned for maximum triangle rendering performance. The actual rasterization code is produced at run time using a code generator and takes advantage of special resources (like MMX) when they are present.
The ICD is the standard solution for 3D hardware support using OpenGL (path B in Figure 1). While it is possible to produce a hardware driver without ICD support, it is not recommended for shipping products. As part of the driver development process, it is natural to provide ICD loading last, while developing code for context creation and triangle drawing first. Completion of the driver to include the ICD loading allows applications linked with the Microsoft OpenGL library to properly execute on your hardware.
The MCD only operates in the Windows NT environment (path C in Figure 1). MCD was designed as an abstraction of the rasterization layer. A couple dozen functions provide support for pixel format management, texture handling and the drawing of primitives. A special structure, the MCD command buffer, is used to pass data across the I/O layer from the user mode to the kernel mode. Most of the actual MCD code operates in the kernel mode, presumably to maximize use of limited data bandwidth across the I/O layer into kernel mode.
Since multiple software renderers (Microsoft and SGI) and multiple OpenGL configurations may exist on a machine, we need to consider what paths will be executed under differing environments. Which OpenGL is executed depends on a the setting of the pixel format descriptor (PFD) flags for OpenGL 1.1. This PFD is set by the hardware manufacturer. In general, the results below are expected when both OpenGL implementations are resident on the machine.
- When no ICD or MCD is present, the Microsoft implementation is used unless the application is explicitly linked with the SGI library (unless tricked as described above). This SGI library, called OPENGL.LIB, instead of Microsoft's OPENGL32.LIB contains the optimized rasterization software.
- When an ICD is present, OpenGL will defer all rasterization to the ICD hardware driver for any accelerated pixel format and state. When a pixel format is chosen that the hardware does not accelerate, the Microsoft software rasterizer is used as the fallback. If a state (or primitive) is chosen which is not hardware accelerated, the software rasterizer is again used as the fallback. At this time, if an ICD is present, the fallback software rasterizer will be the Microsoft rasterizer, regardless of how the application is linked. Let me repeat this because it is not intuitive: even if you have linked with the SGI OpenGL, the Microsoft software rasterizer is used when the hardware defers rendering. This means that SGI-specific extensions will not be supported in this case.
- When a MCD (Windows NT) is present, OpenGL will defer all rasterization to the MCD hardware driver. As with an ICD, hardware acceleration will only occur when the pixel format is supported by the hardware and the current rendering state is supported. Otherwise, drawing will take the software rasterization path, again via Microsoft's rasterizer, regardless of how the application is linked.
All of this creates confusion as to what contexts, primitives and states are hardware accelerated. The evaluation of the registry is often suggested as solution, but the registry does not contain sufficient information. The use of a library like isfast, written by SGI provides a dependable way to determine optimal rendering paths.
The OpenGL Source
Because an ICD contains the entire OpenGL rendering pipeline, it is worth some study. Both Microsoft and SGI license the ICD source code to hardware vendors, but their code differs considerably. For example, the SGI code contains software optimizations to the pipeline (discussed below) that do not exist in the Microsoft code. The SGI implementation will be discussed here, due to more liberal attitudes on disclosure. Both Microsoft's and SGI's ICD software may be built into ICD or a standalone OpenGL DLL. When built as a standalone OpenGL DLL, hardware-specific driver code and the ICD mechanisms are absent, producing a pure software rendering pipeline.
The SGI pipeline code is divided into eight basic modules:
- Online-generated (OG) code uses run-time code creation to speed software rasterization, clipping, transforms, lighting, state changes, and projections. At run time, OG code generates a memory buffer of vectorized Intel machine language, based on the context, primitive and state. The buffer is then pointed to and executed. This avoids loop overhead (among other things) and provides very fast execution. Naturally, not all rendering conditions can be handled, and in those cases the generic, slower, software rasterizer is used as a fallback. The Autogen code is implemented in a similar fashion to a hardware device driver, by replacing the rendering procedure tables with it's own function pointers.
- Software renderer code is responsible for generic rendering, including both the geometry pipe (lighting, transforms, clipping, etc.) as well as a generic rasterizer. It is here that the fall back rasterizer resides for when the fr code cannot be used. You may recall for our earlier discussion that if and ICD (hardware) is present, the SGI OpenGL rasterizer is not used as a fallback, the Microsoft OpenGL is. So this generic software rasterization code may not get called as a hardware fallback.
- Autogen code comprises a significant percentage of the OpenGL pipeline and focuses on the rasterizer and display lists. This special code, generated at compile time, is produced automatically from a set of UNIX tools such as awk and Perl. Since many operations are similar, for example span fills for 4-, 8-, 16- and 32-bit pixels, the compile time code generator produces all the various the iterations for each state and context based on more generic code. It is here that the actual API function interface code is produced as well.
- Pixel code is actually composed of a large set of routines. All pixel operations are performed here: storing, reading, scaling, zooming, blending, packing, lut table management, arithmetic operations and some span operations.
- Display list management code is responsible for the display list calls, heap management, list sharing, list compilation, validation and list optimizations. In fact, device drivers can perform implementation-specific glList optimizations using their own proprietary list management code that replaces the generic process in the OpenGL process table.
- Vertex caching is used to build up the points, lines, triangles and polygons primitives between glBegin and glEnd calls. All the memory management functions, validation and primitive-specific processing occurs in this module.
- Wgl code provides for context management, Mutex synchronization, memory management of drawables, thread management, and procedure table initialization based on registry validation and PFD checks. It is here that a determination is made of which what procedures are loaded into the dispatch tables and whether Microsoft OpenGL or the SGI OpenGL is used.
- ICD code is specifically designed for implementing the Microsoft ICD mechanism. This code, which interfaces with the 2D driver, provides a variety of required services in order for the display to share access with Windows.
Although massive energy has been spent optimizing software-only OpenGL, it is with 3D hardware acceleration that performance really shines. Accelerators currently implement the rasterization process only. This portion of the pipeline, however, consumes the vast majority of cycles. For the hardware manufacturer, implementing OpenGL on top of their hardware is a somewhat complex process. Because of how this is done, your game performance may vary depending on vendor's implementation. Let's review the steps involved in order to gain a better understanding of where we can get tripped up.
A hardware-accelerated OpenGL application running on Windows 95 can be implemented in two ways: via an ICD or as a standalone application. That is, the hardware vendor may implement the Microsoft standard ICD mechanism or it may choose to produce its own OpenGL .DLL that is directly attached to the accelerator. By producing an ICD, the hardware vendor follows the Microsoft prescribed method. This requires licensing the ICD Kit from Microsoft however, and means you must modify the 2D display driver to support a set of ICD-specific extensions. These 2D modifications are in kernel mode software and their mechanisms are subject to disclosure restrictions. SGI will probably provide an isolating layer that eliminates the need to access this proprietary Microsoft code, yet which still produces a Microsoft-compliant ICD. Because of this required 2D driver modification, it is not possible to produce a software-only ICD. Further, the ICD cannot support secondary display devices such as the 3Dfx Voodoo. Remember, however, that the code to produce an ICD is largely independent from the code required to produce a hardware-accelerated OpenGL application. As such, many vendors will produce a non-ICD-accelerated OpenGL first (path D in Figure 1), to be followed up later with an implementation of the ICD mechanism.
Let's review the steps a hardware vendor must go through to support accelerated OpenGL applications under its hardware. As discussed, if the vendor wishes to implement an ICD, the Microsoft ICD Kit is usually licensed (even if the vendor is also using the SGI isolation code). Also, if producing an ICD, the IHV must decide on the name for its DLL (such as MY_DRV.DLL) and edit the registry to include the reference. The final ICD-specific code is the implementation of register mapping and modification of the 2D driver. The remaining work is non-ICD specific and is required for both an ICD and non-ICD implementation.
The hardware vendor must supply code to allow exclusive access (locking and unlocking) to the hardware. This prevents asynchronous access to the hardware by multiple threads, since each thread has its own specific state stored in the Thread Local Storage (TLS). Currently, exclusive access is managed by a mutex handling function that may have to wait for FIFO emptying, so depending on a vendor's implementation, toggling exclusive access can consume significant amounts of time and lead to long periods of hardware locking. (The FIFO hardware holds 3D commands pending execution by the 3D rasterization hardware). These locked periods can vary from driver to driver, and as such, the overall rendering performance may be significant (especially when many threads compete for access to the 3D hardware and lots of lock thrashing occurs). In practice however, locking occurs anytime a render starts and unlocking ends, which makes isfast an attractive option for performance profiling (as opposed to relying on the registry).
Next, the hardware vendor must modify the wgl functions to recognize its hardware. It is here that the IHV specifies which primitives, states and pixel formats the hardware accelerates. When your application creates a context, implementation-dependent values are checked based on the specifications of the IHV's device driver. If your code requests a context that is out of range for the specific implementation, the context creation will fail for hardware acceleration and the system will default to software rendering -- without telling you.
The actual process of modifying wgl begins by the IHV providing a glDevice structure that contains the necessary pointers to functions in the OpenGL glDevice structure. In wgl, the IHV specifies the functions to use for process and thread attachment management, pixel format control, initializing the device and its buffers, as well as functions for locking and unlocking. In a software-only implementation, these functions point to generic procedures. As an aside, it is here that modifications are made to use Direct3D surfaces instead of forcing the specific creation of an OpenGL surface.
The IHV also specifies the available pixel formats for rendering. These formats are provided back to the application programmer in priority order, with hardware-accelerated formats earlier in the list. Your application can check the pixel format descriptors (PFDs) using the wglGetPixelFormat function.
The last wgl task is for the IHV to provide a "create context" function, in which the IHV describes (among other things) the pipeline limitations. This means game developers cannot assume that the upper limit defaults of OpenGL are constant across hardware implementations. For example, a hardware vendor may set the maximum allowable texture size or the maximum number of lights within this function. During context creation these limits are checked, and a failure to create a context will occur if the implementation-specific values are out of range.
The IHV is now ready to provide hardware-accelerated functions. One by one, the IHV adds code to accelerate primitives, state by state. For example, a hardware-accelerated point function may be added. However, separate point functions are required for aliased, anti-aliased, fogged, flat, index-colored and other states. Just because a given pixel format is accelerated and you know that points are accelerated, it does not mean that all versions (states) of the primitive are accelerated. To make things a bit more complex, the state your application controls is not actually the states checked by the IHV. Which states are called is determined by modeFlags, which may be modified by a number of routines that you do not have access to. This is because your application may set an unreasonable state. For example, you may set texture mapping on, but not specify a texture. As such, the modeFlags will not have texturing enabled. So in order to determine if your pixel context and state are both accelerated by the current hardware, you must also understand what primitives can be accelerated and, more importantly, within which states this acceleration can take place.
The final step for the IHV (and the game developer) is testing. Although this seems obvious, there is some latitude in the interpretation of "correct" rendering. For example, one card (which shall remain nameless), has a maximum texture size of 256 pixels in any dimension. When an application uses textures that exceed this dimension, the render occurs without the texture. You might expect the hardware to fall back to the software renderer in this case, but in this case the driver authors decided it was better to render untextured triangles. In a more common example, some hardware will mipmap render using the nearest-filtering method, even when linear filtering is requested. The assumption made by the driver authors was that the user does not really want the slower linear texturing mode, so it defaults to the faster -- but incorrect -- nearest mode. Fortunately these kinds of problems are beginning to be addressed via control panels within games. Using control panels, players can set a control for the "fastest" or "most correct" rendering preference.
One of the more vexing aspects of OpenGL has been the inability to use DirectDraw surfaces. SGI is developing a version of OpenGL that will allow the application to pass in a DirectDraw surface to render into. This DirectDraw version, called dgl, will replace wgl as the windowing interface. The use of DirectDraw simplifies the process of full-screen rendering for games and high-speed 2D drawing, outside the 3D pipeline. The dgl interface will not only allow the application to draw to the "OpenGL" window with DirectDraw, but with Direct3D as well. It will not be efficient to use both APIs at once, and in fact the specific ways that Direct3D and OpenGL would interact (and synchronize) have not been worked out yet. However, the concurrent availability of both APIs to a single drawable could make it easier to port an application from one API to the other. For instance, DirectDraw OpenGL could simplify porting game code from Direct3D to OpenGL.
Even with blazing rasterization performance, the geometry pipeline's delivery rate can still hold up hardware. A number of hardware geometry solutions are under development by IHVs for the PC. SGI's OpenGL provides a modular mechanism for implementing transforms and clipping in hardware. Using a procedure table, loaded at context creation time, an IHV can install its own procedures for clipping and matrix operations when such hardware becomes available on the next generation of cards.
Although OpenGL is still somewhat complex, and some negative comments have been made about its lack of hardware support and its liberal implementation control, OpenGL can be a high-performance solution for games. SGI's recent commitment to support hardware vendors under Windows 95 and their planned support for DirectDraw solve the most serious problems of this 3D API. Mix this with cross-platform development opportunities, geometry processing in hardware, and support for both low-cost consumer hardware and high-end professional hardware, OpenGL provides a reasonable solution for fast-track game development and deployment.
Greg Passmore has over 20 years experience writing and optimizing graphics pipelines, has received a patent in optical computing and is an avid OpenGL user. Greg may be reached at [email protected].