What can technical artists learn from testing 'shader myths'

Tried-and-tested tech art tricks for making splendid shaders could always use a little jolt of reality.

November 8, 2022

7 Min Read

A screenshot of several shaders operating in Unreal Engine 5.

Technical Art is one of the video game industry's most in-demand professions, and for good reason. It's a highly specialized skill that makes the most of complicated math to render beautiful particles and shaders. To make the most of such a complicated craft, tech artists often rely on standard tricks and assumptions in order to maximize their time and effort in an arduous game development cycle.

But should tech artists always be relying on those old assumptions? Like in any profession, the answer is "not without checking them first." Epic Games developer relations technical artist Matt Oztalay recently blasted this message loudly and clearly in a presentation he gave at Game Developer Talks—a new webinar series coordinated by Game Developer and our colleagues at Game Developers Conference.

In his talk—titled "Investigating and Dispelling Shader Myths... with Science!"—Oztalay challenged tech artists to reassess their assumptions about this complicated craft. Should you use a LUT instead of a polynomial? Should you always pack your floats into a vector before you do the math on them?

Maybe! But if you've got the bandwidth, why not double check your math before pushing "commit" on that code? Here's some quick lessons from Oztalay's talk:

Is instruction count equivalent to performance?

In any piece of software, instruction count is the total number of instruction executes contained within a program. Oztalay admitted that it's a "tough" myth to crack because in Epic Games' Unreal Engine...instruction count is displayed in a number of key places.

"Unfortunately, these don't tell the whole story," he said. "HLSL instructions are just one part of a larger pipeline to get what you want showing up on the screen."

So conventionally, a tech artist builds a serial graph and turns it into an HLSL, which eventually becomes generalized Assembly code. Then it's run through the specific graphics card, and that Assembly code becomes hardware-specific bytecode to be executed on the GPU.

But according to Oztalay, GPUs "don't really like to backtrack" to run one set of instructions. Therefore not all HLSL instructions compile out to the same number of bytecode operations or cycles.

If your brain's a little broken by all these programming nouns (mine is, that's for sure), Oztalay had a helpful metaphor. "You can think of shaders like a recipe," he said in language friendly to dumb writers like me. He described a recipe for cookies that has six steps and is done in one hour. Then another recipe for some fancy french dish might also be six steps, but take six hours.

Take each of those recipes, break them down into their constituent parts (the steps below the steps) and you'll see that each step does not take an equivalent amount of time. "I don't know about you, but it takes me less time to dice an onion than it would [take] a toddler," he quipped. "But it takes a toddler and I the same amount of time to pour oil into a Dutch Oven."

While you picture that toddler pouring oil away into a Dutch Oven, you can consider Oztalay's bigger point: "Sometimes an instruction is as simple as 'draw some circles,' and sometimes an instruction is as complex as 'draw the rest of the dang owl,'" he said. Because instruction counts can contain different kinds of instructions, they aren't a great way to measure performance.

What is a good measurement of performance? Why not the number of frames you can render in a second? It's your ultimate goal as a tech artist—to make those shaders perform well without impacting the number of frames—and it can help you measure more assumptions.

Is multiply more performant than divide?

To test further shader myths, Oztalay ran some experiments using custom expressions instead of Unreal Engine's nodes. So many modern GPUs and graphics hardware optimize for developers, and he wanted to "unoptimize everything."

By deliberately checking for the "worst case scenario" in all of his tests, Oztalay could better understand how his code was performing. This led him to check the power of multiplication versus division in materials. "I always understood that dividing in a material is a more expensive operation than division, but I never questioned that assumption."

He described how he once wrote a material that took in degrees input, and he needed to convert it to radians before doing any trigonometry on the value. "Since the trig operations in Unreal's material system use a period of one—that means one radian is one degree or the reciprocal of 360 or .0027 repeating. Because I always understood that divide was more expensive than multiply, I multiplied my Gries value by that nonsensical .0027 repeating number instead of just dividing it by 360, which would have been more readable and legible."

In a test of sample code displayed during the presentation, Oztalay displayed the two outcomes of dividing by 360 versus multiplying by .0027 repeating. The performance results were "pretty close." But why was that?

A chart showing the results of a multiply and divide equation—and divide's performance is not that far off from multiply.

Oztalay dove into the bytecode (this was all with Tim Jones' shader playground, if you'd like to do your own testing). What he found was that instead of the GPU doing any sort of recursion or conditional operations, what happened was that Oztalay's first equation "got reciprocal" at the end, and the second operation "multiplies the divisor by that reciprocal."

"Any sort of divide operation is just going to be a quick reciprocal, and then a multiply, and then you still get your division out the other end," he said. "It's a little more expensive—because it's two operations—but it's not dramatically more expensive."

And so now having run this test Oztalay—and you—can be a little more confident in using division when building beautiful shaders.

Is the cost of a power node exponential?

We'll wrap up this recap of Oztalay's talk with a breakdown of the costs of a pow operation—raising X to the Y power—are exponentially connected to the value of Y.

"This sort of makes sense, right" He asked the audience. "If you're given a limited series of bytecode operations, then it stands to reason that to raise a value to a [given] power, you would need to multiply it by that many times."

In his example, you'd take a little bit of power—2^8—and then loop the actual math behind that equation.

"If you look at the timing for this, it gets a little interesting because they're the same number." He said that outcome was "quite strange." On a graph, he showed how the value did raise exponentially, but flattened out at the 64th power. "True exponential graphs are a sum total—they don't ever flatten out, they only flatten out at infinity," he said.

An exponential Pow equation that should not be flat.

As you can see above, the results were dead flat. So what was going on?

Oztalay dug into the bytecode and inspected the resulting Assembly code in shader playground. What he found was that once his code ran to the 16th power, it did a mathematical trick for exponentiation where it calculated the log of the base value, then multiplied the log of the base by the exponent, and then raised "e" to that power.

If GPUs are already doing so much optimizing after these calculation stake place, why did Oztalay care? He talked about how he'd recently been doing some broader optimization work, and didn't take the time to think about how a specific pow operation was affecting overall shader performance.

"As with so many of these myths, it's soemthing that I had heard or had always done, so of course I did it," he said in reflection. After showing off a buttload more math breaking down his personal investigation, he concluded that "in certain circumstances, if you've got a couple of floats that you need to do math operations on, it might be faster to just do the float multiplication instead of trying to pack everything together."

If you'd like to dive into Oztalay's math for yourself—and learn about more shader myths in need of busting—his full talk has been archived for your viewing here.

About the Author(s)

Bryant Francis

Senior Editor, GameDeveloper.com

Bryant Francis is a writer, journalist, and narrative designer based in Boston, MA. He currently writes for Game Developer, a leading B2B publication for the video game industry. His credits include Proxy Studios' upcoming 4X strategy game Zephon and Amplitude Studio's 2017 game Endless Space 2.

See more from Bryant Francis

Related Topics

Related Topics

Recent in More

Related Topics

What can technical artists learn from testing 'shader myths'

Is instruction count equivalent to performance?

Is multiply more performant than divide?

Is the cost of a power node exponential?

About the Author(s)

Latest News

Trending

Cooking Games Spotlight: Deep Dives, Interviews, and More

Featured Blogs

Game Developer Essentials

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">What can technical artists learn from testing 'shader myths'</span>What can technical artists learn from testing 'shader myths'

Is instruction count equivalent to performance?

Is multiply more performant than divide?

Is the cost of a power node exponential?

About the Author(s)

Latest News

Trending

Cooking Games Spotlight: Deep Dives, Interviews, and More

Featured Blogs

Game Developer Essentials

What can technical artists learn from testing 'shader myths'