Game developers want to deliver the most engaging and enjoyable experiences possible, but it's not an easy task. With the rapid expansion of gaming platforms, visual computing technologies, and immense competition, developers are under enormous pressure to create games that are not only innovative, immersive, and graphically amazing, but -- perhaps more importantly -- able to perform on a wide range of systems in a quickly expanding device ecosystem.
When it comes to optimizing game performance, the Intel Graphics Performance Analyzers tool suite (Intel GPA) can help. Used for years by many of the world's leading game houses and engines, Intel® GPA helps developers uncover and solve a wide range of performance issues so they can deliver the best game possible. Fast.
This paper highlights common challenges encountered during the development of three world-class games and how developers used Intel GPA to quickly solve them, resulting in significant performance and quality gains.
Download the latest version of Intel® GPA FREE. Visit intel.com/software/gpa.
XCOM: Enemy Unknown
Understanding Your Game Engine
THE GAME: A re-imagining of the 1994 PC title, XCOM: Enemy Unknown by Firaxis Games is a strategy game set in the near future during an alien invasion. Players direct the combat missions and operations of XCOM, an elite paramilitary organization tasked with defending Earth. Built on Epic's Unreal Engine, XCOM: Enemy Unknown is an acclaimed modern remake of the original, delivering tactically rich gameplay that keeps players engaged through tricky maneuvers and tension-fueled decisions.
CHALLENGE 1: Sudden FPS drop of nearly 300%. During a short scene involving character rendering and shading executions, frame rates unexpectedly dropped from 30 to 11 FPS. Though the slowdown lasted only a few seconds, it noticeably affected game performance.
Image 1: Short scene where frame rates dropped 3x during shading executions.
SOLUTION 1: Intel GPA frame-capture analysis. A feature of Intel GPA allows developers to immediately capture a snapshot of any scene and then, using Intel GPA Frame Analyzer, visually drill down to uncover the root cause of performance issues. "In this case, the issue turned out to be a shader that was intended for offline rendering but was left in the production version by accident."
THE RESULT: Frame rate jumped to 30 FPS. Although the impact of the non-production shader wasn't extensive, by using Intel GPA the development team was able to identify and fix the issue within minutes.
CHALLENGE 2: Even after fixing the non-production shader, performance was still sub-optimal. While working with Firaxis, Intel engineers found around 30-40% of the total frame time was being spent copying render targets, the reason being Unreal wanted to resolve the targets which is required when using MSAA. Firaxis was not using MSAA so finding a way to remove the copies was ideal.
A resolve is where you take a render target that was previously written to and copy it to a texture so that you can use it as input on another operation like a post processing effect. With a post processing effect like blur you will need to do a resolve after the horizontal and vertical steps. The copy can be an expensive operation so if you have many of these it can really add up to a big chunk of your overall frame time.
SOLUTION 2: Intel GPA's "select all draw calls doing XYZ" capability. With the Intel GPA Frame Analyzer tool, a developer can right-click on any asset and see all of the draw calls that use it. By understanding where and to what extent copy jobs are being used, developers can strategically remove them to optimize performance.
Which is what Firaxis did. Using this feature in Intel GPA, they identified and removed all copy jobs from the game. It's important to note that copy-job removal in non-MSAA games should be done strategically and on a case-by-case basis because removal can sometimes cause noticeable visual artifacts. For XCOM, removing 100% of the copy jobs was optimal. But this may not be the case in other games, which is why Intel GPA Frame Analyzer is useful: it helps developers identify the most critical performance bottlenecks within a frame.
Could copy jobs have been found without Intel GPA? "Unlikely," says Jeff Laflam, Intel Senior Application Engineer, who advised Firaxis on this project. "A developer could manually uncover one or two instances where a copy job was happening, but to see copy jobs everywhere and see the extent to which the game is being affected? Without Intel GPA, it would take so long as to essentially be an impossible task."
THE RESULT: Increase of 1.5x FPS. Frame rates increased from 21 to 31 FPS, a 1.5x improvement.
Wargame: European Escalation
Optimizing Vegetation Rendering
THE GAME: Wargame: European Escalation is a real-time strategy game developed by Eugen Systems. Set in Europe during the 1970's Cold War, players command forces from either NATO or the Warsaw Pact and engage in historical battles and tactical campaigns.
Wargame delivers highly detailed, vivid graphics and offers a robust camera-zoom capability that moves from tight close-ups to miles-wide battlefield views.
THE CHALLENGE: Significant FPS drop when zooming into certain scenes. When zooming into battlefield scenes, frame rate was dropping significantly.
Using Intel GPA Frame Analyzer, it was determined that the grass fields were affecting performance because they were being rendered with constant density; that is, grass elements that were far away from the camera were drawn with the same number of draw calls as grass that was in close-up view. This resulted in a geometry/overdraw bottleneck.
Image 2: Improved performance using level of detail (LOD) implementation.
Two solutions were used to fix the issue.
SOLUTION 1: LOD (level of detail) implementation on grass rendering. LOD is the technique where the geometry of objects that are far away from the camera -- beyond what the eye can really see in any detail -- use a different rendering complexity than the geometry of objects that are in near-view.
Using the Intel GPA Frame Analyzer tool, Intel engineers were able to deep-dive into the hardware and discovered hardware inefficiencies when rendering the grass geometry. Eugen engineers applied LOD techniques to render the grass vegetation at different densities depending upon the distance from the viewer. They also optimized and simplified grass shaders for these distant chunks of grass fields and disabled Z-Write for these distant objects because it was faster and did not produce any visible artifacts.
SOLUTION 2: Shadow optimization. Shadow generation presents the user with an extra level of realism in top-tier games. However, engineers need to make tradeoffs between rendering time and the quality of the shadows.
Two distinct and complementary improvements were made, reducing the processing workload without sacrificing overall image quality. First, rendering performance was improved by shifting GPU processing -- which was heavy on pixel shader calculations -- towards better utilization of available vertex shader bandwidth. Second, shadow maps were limited to the rendering of objects that played a significant role in a scene, rather than the processing of shadows for pixel-sized objects that were too small for the human eye to recognize.
Image 3: Intel GPA frame analyzer identified opportunities for shadow optimization.
RESULT: Better than 2x increase in FPS. Frame rates for a sample frame that used these two solutions increased from 12.8 to 27.7 FPS.
Sid Meier's Civilization V
Doing More per Draw Call
THE GAME: Civilization V is a critically acclaimed strategy game where players strive to become Ruler of the World by establishing and leading a civilization over thousands of years. Developed by Firaxis Games, Civilization V, the fifth in the award-winning series, uses a new game engine and delivers highly detailed, graphically rich visuals that fully immerse players as they wage war, conduct diplomacy, and build nations.
THE CHALLENGE: Increasing performance to support a great user touch experience. While optimizing Civilization V for PC "touch" capability, response times were a sluggish 16 FPS in the worst performing section of the game, far slower than what is needed for a compelling gaming experience. Using Intel GPA Frame Analyzer, it was quickly uncovered that frames were using up to 10,000 draw calls -- a very high number.
Using GPA, Firaxis developers and Intel engineers could tell that each object was rendered as its own individual draw call. For example, if a frame had 12 objects -- buildings, soldiers, trees, etc. -- that were the same type and model, each object instance was rendered as a separate draw call. Furthermore, they could tell from GPA that these objects shared the same materials, which helped in moving through the solution phase.
Image 4: Civilization V scene showing multiple objects with the same geometry and materials. Each was originally rendered as its own individual draw call.
Game performance was CPU bound, which was swamped in processing 10,000 draw calls per frame. Because the CPU could not "feed" the GPU fast enough, frame rates were exceptionally low.
THE SOLUTION: Instancing to overcome excessive overhead. Instancing is the process of reducing draw calls by drawing multiple objects with the same geometry and materials -- i.e., the repeated buildings, soldiers, or trees -- in a single draw call. It can be done with objects that use the same mesh, even if they're positioned slightly differently in the scene. The continued use of GPA throughout the instancing phase showed what could and couldn't be instanced. That was done by inspecting the draw calls for similar materials and shaders.
Seeing immediate performance gains in initial experiments, Firaxis used instancing throughout the game to collapse similar models and textures into one draw call.
Image 5: Instancing: before and after. Draw calls were reduced 3x, from 10,000 to 3,000.
THE RESULT: Over 3x reduction in draw calls and 2.5x FPS increase. Draw calls were reduced from 10,000 to 3,000 per frame. Frame rates saw a 2.5x improvement, increasing from 16 to 40 FPS.
Get Started Today
Given the expanding platforms and capabilities of gaming devices, performance optimization will always be a top priority. Whether you're an indy developer or a large game house, Intel® GPA can help you analyze and optimize your games, and make them run faster, faster.
Download the latest version of Intel® GPA FREE. Visit intel.com/software/gpa.
The Business Benefit
How is the Intel Graphics Performance Analyzers toolset different from other development tools?
- Deep-Dive Analysis. Obtain accurate CPU, driver, DirectX, and GPU metrics, quickly identify the location and magnitude of potential performance bottlenecks, and uncover hidden problems by performing "what-if" experiments within the product (all without modifying your code).
- Data Tailored to Your Application. Customize views and reports to fit your specific needs.
- Best Tool for Intel Hardware. Intel GPA is the only tool on the market that uncovers detailed Intel GPU metrics for all stages of the rendering pipeline.
- Free Download. Intel GPA is free to download and use. Download the latest version today at intel.com/software/gpa.