Intel made a splash in the tech world with the Consumer Electronics Show unveiling of its "Sandy Bridge" second-generation core processors. Accompanied onstage by Valve's Gabe Newell, the company wanted to let the game development world in particular know that there was plenty to be excited about in the new chip.
Now, in Gamasutra's latest Intel-sponsored feature
, the company explains more about specific graphics functionalities -- more tightly-integrated with the CPU than before, Intel claims.
The feature focuses on a particular "onloading" technique called Onloaded Shadows, and its developer, Zane Mankowski, and his team explain the feature, with a link to source code available in the article.
Mankowski describes the purpose of the Onloaded Shadows technique:
"Many games have outdoor scenes where the sun is often the primary light and changes direction slowly over time. Generating shadow maps for these outdoor scenes and for static objects isn't required every frame. They can be generated asynchronously to frame rendering, at a cadence of only a few times a second or even once every few seconds.
Using the GPU to generate these shadow maps synchronously, we can split the workload apart and distribute it across several frames. The CPU can perform this workload asynchronously with Microsoft's Windows Advanced Rasterization Platform (WARP) software rasterizer.
The Onloaded Shadows technique uses WARP to asynchronously generate shadow maps. Copying the data from the CPU to the GPU is the only synchronous work required. The overhead of the copy operation is distributed across several frames to reduce the impact."
According to Mankowski, scenes are rendered by the main thread by shadow map data stored on the GPU, while the WARP thread generates the shadow map asynchronously:
"The WARP thread copies the shadow map to a staging buffer and maps it to a subresource. The GPU then updates its shadow buffer with the mapped subresource. The new camera data is utilized once a copy is complete, and then the WARP thread is signaled to once again begin shadow map generation.
Alternatively, asynchronous shadows can be naively implemented by generating shadows synchronously on the GPU every set number of frames.
In this way, a GPU technique which generates the same results can be used to compare with the Onloaded Shadows technique, and performance can be compared by looking at how much of a spike in frame time occurs during either the subresource copy (for Onloaded Shadows) or during the synchronous shadow processing (for the GPU technique)."
To avoid the stalling that frequent, significant frame time spikes would cause, the work done during the synchronous frame gets broken down into small pieces to minimize the impact, in a method called the Distributed Stall optimization.
"For the Onloaded Shadows technique, the synchronous copy can be easily subdivided as far as a single byte. For the GPU technique, because the work is not homogenous, breaking apart the shadow processing work becomes significantly more complicated."
For further explanation, please see Gamasutra's sponsored feature, Onloaded Shadows: Moving Shadow Map Generation from the GPU to the CPU