[In this sponsored feature, part of Gamasutra's XNA microsite, Microsoft's Kevin Gee explains in-depth the new features of DirectX 11, from improved multi-threading to Shader Model 5.0 and beyond.]
Recently, at its annual Gamefest conference, Microsoft announced the forthcoming DirectX 11 API set. This technology, whose key features and benefits are discussed in this article, enables developers to take advantage of the latest hardware developments across both CPUs and GPUs...all while easing development pain. Let's take a look at the rich set of DirectX 11 features.
- Down-level hardware and operating system support
- Improved multithreaded device
- New hardware stages for tessellation
- Improved texture compression
- Shader Model 5.0
- Compute shader
- Additional features
Down-Level Hardware and Operating System Support
Windows Vista and DirectX 10 were engineered to improve the underlying Windows Display Driver Model (WDDM) and create significant opportunities for driver performance improvement. In addition, the DirectX 10 API was designed to be cleaner and simpler, with the near full removal of capability bits, thereby making client code easier to write and removing development pain. DirectX 11 brings enough new features to be a full version update, however, since it builds upon and extends DirectX 10. Anyone familiar with DirectX 10 and 10.1 will feel immediately at home with DirectX 11. With DirectX 11, it is possible for developers to target hardware feature levels 10, 10.1, and 11 by using a single set of functions.
The timing for the final release of DirectX 11 aligns with the next version of Windows, but the API will also be made available on Windows Vista. Thus, with the DirectX 10-class and 10.1-class hardware level already in consumer's machines, there will be a lot of hardware to target right from launch.
Improved Multithreaded Device
Earlier releases of Direct3D focused primarily on single CPU configurations and as such had limited threading support. With DirectX 11, the API has been updated to enable developers to better drive the GPU from a multi-core CPU. DirectX 11 improves scaling on CPUs via changes to both the API model and driver model. Asynchronous device access becomes possible through two key features of the Direct3D 11 device object.
- First, improvements in synchronization between the Direct3D device object and the driver enable asynchronous API calls, including resource allocations. Direct3D 11 allows developers more freedom when expressing parallelism by allowing such calls to occur across multiple threads.
- Second, the Direct3D device interface now supports multiple rendering contexts. 1) a primary immediate context which dictates the timeline for work submission to the GPU, and 2) optional deferred contexts created by the application developer as needed. Work associated with each deferred context can occur on a separate thread/core. This enables GPU commands to be accumulated in parallel to the main rendering work, and then sent to the GPU later when the main context is ready to submit a new task to the GPU.
The following figure shows rendering tasks being queued in parallel to the main immediate context, and being submitted as they become complete.
This feature of DirectX 11 supports Direct3D 10-class and 10.1-class hardware, too, so changes made in the way applications render will benefit existing hardware.
New DirectX11 Hardware Features
Next, let's take a look at some of the hardware specific features DirectX 11 brings.
New Hardware Stages for Tessellation
DirectX 11 brings three new
stages (hull shader, tessellator, and domain shader) to the rendering pipeline.
These stages enable flexible, programmable hardware support of tessellation. The
hull and domain shaders are programmable parts; the tessellator is fixed
function but supports a number of insertion settings providing control over the
generated position data.
This programmable unit allows transforms on input data to be performed as it runs at the source control mesh frequency. When discussing applications of the pipeline, we often mention performing a basis change in this shader, from one surface representation to another-for example, from Catmull-Clark quad mesh to Bezier patch controls.
This fixed-function unit can be simply thought of as a data expander and as a place where the IHVs can safely parallelize with the user-provided algorithms. It takes tessellation factors as input and inserts vertices in surface U,V space according to the chosen partitioning scheme.
This unit executes once for every generated vertex, and as such is the place where surface formulations are evaluated. The inputs to this stage are provided in surface U,V domain ready for parametric surface evaluation.
The pipeline supports several
input types (quad patch, triangle patch, or even poly-line), which allows
developers to target almost any surface formulation. One usage scenario that
has been strongly requested is support of sub-division surfaces for rendering
Sub-Division Surface Approximation Schemes
Charles Loop and Scott Schaefer from Microsoft Research worked on a number of approaches for approximating sub-division surfaces that can be applied to the DirectX 11 pipeline. One of the approaches, provided as a DirectX 10 sample in the DirectX SDK, changes a quad patch basis mesh into Bezier surfaces of fixed tessellation. When applied to the DirectX 11 pipeline, this and other schemes can be used to deliver real-time rendering of sub-division surface meshes.
Improved Texture Compression
Textures in games are often the largest area of memory utilization, so it should be no surprise that further improvements to texture compression are needed to keep working set size and memory bandwidth consumption within the rates required for real-time rendering. DirectX 11 arms developers with new compression formats (BC6 and BC7) to help target high-quality rendering without sacrificing performance. Here we will focus on two specific examples of how DirectX 11 raises the bar for rendering quality. Some of you may be more familiar with the older DXT-style naming convention, which was changed to the block compressed (BC) naming convention for DirectX 10. The newer naming convention is used here.
Compression of High Dynamic Range (HDR) Image Sources
High dynamic range image sources are very common in games these days. When combined with intelligent tone map operators, HDR is often required to make titles look photorealistic. The new block compression (BC) scheme, BC6, has been designed to provide high-quality 6:1 compression of HDR image data with hardware support for decompression.
Here we can see a comparison image for the HDR format. On the left is the HDR original image tone-mapped to a given exposure, and on the right is the equivalent BC6 image. The absolute error image is in the center. Notice how the Abs image contains no obvious blocking errors, the errors we are seeing are generally diffused noise errors. These are visually much less noticeable to the human eye than edges introduced by blocking.
Low Dynamic Range (LDR)
/ Normal Map Compression
The new BC7 scheme provides support for 8-bit/low dynamic range (LDR) data at 3:1 ratios. Here we compare the results of the new format with the existing block compressed approach, BC3.
You can clearly see the blocking artifacts in the BC3 image, which are drastically reduced in the BC7 image. With this feature, developers and artists can expect more from their linear texture content and normal maps for the same or lower cost in memory size.
Shader Model 5.0
DirectX 10 brought you Shader Model 4.0, which included full support for integers and bitwise operators among other features. Direct3D 10.1 added Shader Model 4.1, with support for direct MSAA sample access. DirectX 11 brings Shader Model 5, which utilizes object-oriented concepts to help reduce the pain of shader development and brings optional support for double precision. This update to HLSL enables you to bring the full power of the HLSL compiler to bear on the problem of shader specialization using interfaces, objects, and polymorphism. With dynamic shader linkage, developers can more easily author larger, flexible shaders and permute out specialized, optimized versions for use at run time during specific rendering.
Anyone already familiar with general purpose use of GPUs will be excited to hear about the new compute shader, which brings cross-hardware vendor support for programming the GPU in general purpose ways (GPGPU). There have already been many advances made in applying the huge amount of numerical crunch power GPUs have to large scale computing problems in previously niche markets. With the addition of the compute shader in DirectX 11, Microsoft makes these algorithms possible on the client across a broad range of hardware. Look for exciting new ways that games and other application developers can take advantage of GPUs for tasks other than just rendering.
Key features include communication of data between threads, and a rich set of primitives for random access and streaming I/O operations. These features enable faster and simpler implementations of techniques already in use, such as imaging and post-processing effects, and also open up new techniques that become feasible on Direct3D 11-class hardware.
Even more exciting features are in store for DirectX 11 than can be covered in this basic introduction, but here are two last-minute things we simply couldn't finish this article without mentioning.
Traditionally, IHVs have had to disable Z acceleration structures and algorithms when shaders write to the depth buffer via the oDepth register. The conservative oDepth feature in DirectX 11 enables shaders to write to the depth buffer within a specified region guarantee. This enables the hardware to avoid the full loss in performance by enabling acceleration outside of the guaranteed region.
16K Texture Limits and Texture Clamps
DirectX 11 raises the maximum texture size from 4K to 16K and also provides MIP-LOD control clamps to limit the number of mipmap levels loaded to the GPU.
We're excited to bring you this newest release of the DirectX API set. This version runs on Windows Vista as well as future versions of Windows, and it will work on your Direct3D 10-class and 10.1-class hardware, while exposing the new features of DirectX 11-class hardware. Many of the features are intended to make developer's lives easier while enabling opportunities for new functionality and performance gains. Look forward to a community tech preview in the November 2008 release of the DirectX SDK and start working with this next step in the evolution of graphics technology.
For more information about sub-division surface approximations, see the Sub-Division Surface sample in the DirectX SDK. Also look for the forthcoming Gamefest 2008 talks "Multithreaded Rendering for Games" and "DirectX 11 Tessellation," coming soon at http://msdn.microsoft.com/directx/presentations. Finally, also see Graphics APIs in Windows Vista on MSDN.