If you did a double-take when reading the title of this piece, never fear -- it's not a typo. "Benchmarketing" is a term that was long used as a term to describe the behavior of some graphics cards vendors who would tout their benchmarks as always being the faster than their competition. But that's not what I mean here.
I'm using the term as it was coined by my colleague, Andy Fischer, formerly of Jon Peddie Associates and now at Metabyte. He suggested that game developers could promote their games by building into their titles the capability to do performance tests. In the past, many game programmers have been loathe to do this. After all, if flight sim x runs at 40fps on a Pentium II 400MHz machine and flight sim y runs at 25fps, then users might feel that flight sim y is somehow inferior. (Or maybe it's just that the programmers might end up with an inferiority complex.)
Then came Quake, and after that, Quake II. id software built into their titles the capability to test performance, but unlike a lot of other companies, id never removed the performance test capability. Soon you began to see time demo results everywhere on the Internet, in print publications and even on the sides of graphics card boxes. This certainly made the Quake series of games highly visible. Even today, Quake II is still used as a performance metric, even though the game is well over a year old.
Synthetic versus Applications Tests
I do a lot of performance testing when reviewing products, be they systems, graphics cards, mass storage or even audio cards. At Computer Gaming World and Gamespot, the tools are both synthetic and applications-based.
A synthetic benchmark tries to create a "typical" workload, but also has the goal of increased granularity. By that, I mean that a synthetic benchmark allows you to examine vertically as well as horizontally. For example, 3D Winbench 99 allows you to enable or disable specific Direct3D features in order to see the effect of trilinear filtering. A good synthetic test also allows you to remove the effect of externalities. It's well known that at the refresh rate of your monitor can have an odd harmonic effect on the frame rate of a 3D title that's double-buffered. We're always very suspicious when we see a game peg at 42.5FPS when the refresh rate is 85Hz, for example. 3D Winbench allows you to either triple-buffer (the default) or render to the front buffer only.
Another type of synthetic benchmark is 3D Mark 99 (www.3dmark.com). 3D Mark has the benefit of using a real game engine, the one that will be used in Max Payne. It also has a fair amount of vertical granularity, such as the ability to set specific texture sizes. However, it's still based on a single game engine.
Synthetic benchmarks have their place, but it's a truism that the best performance tests are real applications. It is also true that different applications will behave and perform differently. Just because graphics card y runs great in first-person shooters doesn't necessarily mean they'll run as well in a flight simulator or sports title.
3D GameGauge 1.0
Unfortunately, a lot of the game performance testing that's done out there isn't always done in a rigorous fashion, so it becomes difficult to compare results. Even if you ignore system differences (all Pentium III 500MHz machines are not necessarily equal), there are problems. If one user runs Quake II timedemos using version 3.14, how does that compare to 3.20? The short answer is, it doesn't.
At CGW, we wanted a standard way of determining performance of 3D graphics cards or systems using real games. So we came up with the idea of 3D GameGauge.
3D GameGauge is really very simple. The basic concept was to have fixed length demo loops that would just test the rendering capability of the 3D card. However, we did want to have audio active -- after all, few of us play with the sound turned off. With some early graphics cards, we saw performance actually decrease substantially when turning on audio. A 3D accelerator needs to be a good "systems citizen," after all.
3D GameGauge 1.0 consisted of six titles: Forsaken (a prerelease demo), Incoming, Turok, Quake II, GL Quake and F22 Air Dominance Fighter. We began using it in early 1998. As with any 1.0 product, there were some problems:
- The early demo of Forsaken was flaky. It didn't like some forms of audio acceleration, and there was a weird bug in the way it related to Windows 98 that would sometimes cause severe paging to virtual memory for no good reason.
- F22 ADF was frame-rate limited to 50fps. Early last year, that wasn't a big problem, but it became one later.
- F22 ADF's playback engine was limited to 640x400, despite DID's insistence that it was really running at 800x600.
- There was no real front end to launch all the titles, making it a manual -- and hence, tedious -- process.
- The whole test was pretty first-person shooter intensive. There wasn't a good genre spread.
- In retrospect, using the sum of the frame rates as the overall score wasn't a good scoring methodology. Forsaken would generate huge scores, and minimize the impact of other titles.
3D GameGauge 2.0
We took to heart the lessons we learned with 3D GameGauge 1.0. We worked with game developers and graphics card vendors to help define 3D GameGauge 2.0. (You can download the full spec in MS Word format).
- 3D GameGauge 2.0 requires a minimum of 1024x768x16 as the maximum resolution. Running at a higher resolution minimizes the impact of the CPU.
- There can be no frame rate limiter.
- We wanted to see triple-buffering used, because testing with flip-on-vsync off is a pain. Triple-buffering eliminates the harmonic effect prevalent with double-buffering, but you still can't run faster than the refresh rate. Unfortunately, we've discovered with some of the newer cards that they can peg the frame rate at 85Hz. So it looks like we'll still have to test with flip-on-vsync disabled. It's not very useful if we tell people, "All cards run at 60Hz, so just pick one."
- Ideally, we'd like the AI turned off during playback. This minimizes variations in the playback. It wasn't possible in all cases. One of our titles is Jane's WWII Fighters. The playback engine still has the AI active. I put in a lot of time handcrafting a WWII Fighters cinema that would minimize the impact of the AI. Even now, if you see a plane crash in the first fifteen seconds, you need to discard that test.
So what is 3D GameGauge 2.0? On the surface, it's a series of seven titles that we run to test 3D graphics performance on graphics cards and systems. But it's really a spec that any game developer can use to build in a standard performance test into their system. The entire spec is available at the end of this article, but here's a summary.
- The title must run at 1024x768x16 or higher.
- The minimum, maximum and average frame rate is calculated, and written to a text file called FPS.TXT in the game's working directory. (Do the min/max test once per second, but not more often). The format of fps.txt looks like:
87.04 FooBar v1.01
- The title must be able to be run from a command line, with appropriate parameters (resolution, render states, color depth, and so on) passed in. This allows easier batch automation.
- The game must not have a frame rate limiter.
- The game must support either OpenGL or Direct3D or both.
- The playback must generate at least 2,000 frames to enable better reproducibility.
- The demo should include some real action, not just a flythrough of the world.
Optional, High Wants
- When fps.txt is created, also write to it the running frame rate that was calculated once per second, so a histogram can be generated. The format would look like:
87.04 FooBar v1.01
91 Second 1
75 Second 2
102 Second 3
.68 Second 305
- True color rendering support (32bpp)
- 32-bit source artwork (textures)
- Textures with resolutions higher than 256x256
- Resolutions up to 1600x1200 in all color depths
3D GameGauge 2.0 Titles
If you build this capability into your title, it won't necessarily become part of CGW's GameGauge tests. We need to keep the test suite fixed, for some period of time, so we can do historical comparisons. (We’ll be doing a new version in the fall, to 2.1, so if you do want your game included, contact me at [email protected].) However, it does give your users a useful tool, and you may see your title all over the Internet, if it's a good one. Here's the current list of confirmed titles.
Jane's WWII Fighters (Electronic Arts) - Direct3D
Madden 99 (EA Sports) - Direct3D
Expendable (Rage) - Direct3D
Descent III (Interplay / Outrage) - Direct3D
Half-Life (Valve) - OpenGL
Powerslide (Ratbag) - Direct3D
We're also very close to having Unreal. However, Epic is slaving away mightily at getting Unreal Tournament done, and we're sympathetic to their schedule requirements. The Unreal engine will be used in quite a variety of games in the future. If they don't make it this time, it will be in the fall refresh.
As you can see, the genre spread is wider than GameGauge 2.0. It would be good to have a 3D-accelerated strategy game (such as Myth 2 or the upcoming Shogun) added to the list, but overall, we're quite pleased.
Even if you're not included in CGW's GameGauge suite, making it easier for your customers to test the performance of your title on their system really helps them make better buying decisions. I get e-mail daily from people asking "Which 3D card should I get?"
The question doesn't have a simple answer, because it all depends on what games they like to play. If you build 3D GameGauge into your title, you'll be helping them out. And there are a lot of web sites which aren't affiliated with Ziff-Davis out there doing testing, and that can give your title exposure as well. All in all, it's a small modification that can generate a lot of exposure.
Loyd Case is a freelance writer and frequent contributor to Computer Gaming World (print) and Gamespot (web). He spends altogether too much time up to his elbows inside computers trying to get yet another new graphics card working. His free time is spent downhill skiing, raising two daughters and hosting LAN parties in his basement office. He can be reached at [email protected].
Game Gauge 99 Specification
About the Game Gauge
Artificial benchmarks, as they exist today, are not sufficient to indicate accurately the performance of a complex 3D system such as Incoming or Quake. Consumers are tired of meaningless 3D benchmark scores that don’t correlate to real world performance gains on a given 3D card. Rather than artificially generate 3D benchmark tests, doesn’t it makes more sense to have the game software report the real-world performance based on the actual game data?
The game gauge is just such a 3D benchmark. It’s based on the real world performance of games that we know and love. The game’s score isn’t artificial – it’s what you feel when you are playing – the frame rate! Each game contributes its individual "Average Frames Per Second" to an overall 3D card score, yielding what we call the game gauge.
Game Gauge Score
Each game contributes three numbers to the game gauge score. The first number is the Average FPS number after a fixed length demo loop has been run at the specified game option settings. The second number is the MIN FPS encountered. The third number is the MAX FPS encountered.
For example, if the following games were tested under the game gauge :
Game Title Average FPS Min. FPS Max FPS
Quake 1 20 5 30
Quake 2 20 5 30
Turok 20 5 30
Hexen 2 20 5 30
Forsaken 20 5 30
Incoming 20 5 30
Average FPS Score = 120
Minimum FPS Score = 30
Maximum FPS Score = 180
Notice that the Average FPS score is the sum of all the average frame rates. Scoring high on one game and low on another will not yield an overall higher score in the average FPS number. This promotes creating better and better 3D products, and not just tuning a 3D card to a single game.
The MIN FPS score and Max FPS score are new additions to the game gauge for 1999. MIN FPS is the lowest number of frames per second recorded during the duration of the demo. Max FPS is the highest number of frames per second recorded during the duration of the demo. The addition of these two numbers allow us to better examine the consistency of the games performance during the demo. The MIN FPS number is calculated by adding up the MIN FPS numbers from all the games. The Max FPS number is calculated by adding up the Max FPS numbers from all the games.
Method of Calculating Average Frame Rate
While many methods exist to calculate the average frame rate of a demo loop, we require all game gauge participants to follow our guidelines. By having a single method, we guarantee all games will yield scores evenly across all hardware types.
The required method of calculating your game’s average frame rate is as follows:
NumFrames / Time = Average FPS
NumFrames = Number of frames in demo loop.
Time = Number of seconds the demo ran for. This should be carried out to a precision of at least 1/10 of a second.
Average FPS = Average number of frames per second in the demo loop.
Games should allow for the Average FPS to be numerically in the many hundreds or even thousands of frames per second range. If a quantum leap in technology is created, we don’t want the scores to be artificially low. Make certain that you don’t have a frame rate limiter in your code.
Method of Calculating the Min and Max Frame Rate
It’s very important that each game calculate the min/max frame rate exactly the same way. The method we require is as follows :
startup : (done at startup)
Initialize START_TIME to the current time in milliseconds.
Initialize FRAMES_ONE_SECOND counter to 0
Initialize MIN_ONE_SECOND counter to 2147483647 (Something really big like 0x7fffffff)
Initialize MAX_ONE_SECOND counter to 0
Per game loop: (done every game loop)
After each page flip in the game, add 1 to the FRAMES_ONE_SECOND counter.
Get the time in milliseconds and subtract the START_TIME. If the result is over 1,000 milliseconds, go to minmaxcheck
Minmaxcheck: (only done once every second)
START_TIME = current time in milliseconds.
In essence, every 1,000 milliseconds we compare the number of frames that have passed against our min and max numbers and update accordingly. Like Average FPS, games should allow for the min and max numbers to be numerically in the many hundreds or even thousands of frames per second range. Once again, if a quantum leap in technology is created, we don’t want the scores to be artificially low. Make certain that you don’t have a frame rate limiter in your code.
Fixed Duration Demo Loop
Each game gauge game is required to have a demo loop of a fixed number of frames. This demo loop should be able to be run from a command option , or a single purpose .EXE. No user control or outside intervention should be allowed. At the conclusion of the demo loop, the Average FPS score, MAX FPS, and MAX FPS should be calculated and output to FPS.TXT.
Automation of testing with FPS.TXT
After the game or demo has run, we require an FPS.TXT file to be output to the root directory of the .EXE. This simple text file allows for automation when combined with a front end, and is a requirement for all games to be part of the game gauge.
Each game in the game gauge should allow the game to be run from using only command line options. To allow automation, we must be able to launch the game, have it run the tests, output FPS.TXT, and exit gracefully back to Windows 95/DOS. The "Game Gauge control software" will run all the games, collect all the FPS.TXT files, and output the final scores using the collected data.
The format of FPS.TXT is very simple. It consists of three lines of text. The first entry is the average FPS output in decimal with a max precision of 1/100 of a second followed by a space and the name / version of the game. The second and third lines are the min and max FPS numbers with the text "min" or "max" following the correct number.
A game’s FPS.TXT should look exactly like this :
287.04 FooBar v1.01
We prefer that games keep this order to allow for future enhancements to the FPS.TXT file.
The FPS.TXT should be output in the same directory as the EXE file is located.
Optional Graphing Data
We also have an optional set of data that we encourage be supported to enable a games performance be graphed. Every second of the game, a number of frames are displayed to the monitor. The number of frames displayed in a second is a number easily captured in two bytes, and can be saved into a small amount of memory for performance graphing. One minute of game playback is the same as 120 bytes (60 seconds * two bytes each). A three minute demo needs only 360 bytes of data to show the game’s performance over time. Once the game has completed running, this data should be appended to the FPS.TXT file as the following example illustrates.
A games FPS.TXT with the Optional Graphing Data should look exactly like this :
287.04 FooBar v1.01
20 Second 1
10 Second 2
5 Second 3
34 Second 305
The automation program will import the FPS.TXT file and graph the data over time. By comparing graphs from two different configurations we can examine the relative performance differences during the game playback.
Command Line Options
The automation of the game testing requires the use of command line options to select what would normally be decided using menus in the game. The following options are required for the testing to work across all manufacturers’ hardware, and must be supported.
- Selection of the graphics API to be used: D3D, OpenGL, or Glide as determined by your game.
- Board Selection for 3Dfx hardware: primary/secondary support.
- Resolution: All resolutions on all APIs up to 1600x1200 (or the highest your game supports)
- Double/triple-buffering: If triple-buffering is supported by your game, this switch would enable/disable.
Method of Testing Each Game
Each game that comprises the game gauge should have a detailed method for generating its game score. Preferably, the default option settings remain consistent across all 3D hardware cards, allowing for minimal errors in the process of testing games.
V-Sync. All game tests should be performed with vertical sync turned off and a refresh rate of 60Hz. If a 3D card doesn’t support disabling Vsync, then the game scores should be reported as returned with Vsync on. It’s up to the individual card manufacturer to create a publicly available driver that allows disabling Vsync and setting the refresh rate of the monitor. Higher refresh rates impact the available bandwidth for 3D graphics, so it was decided that 60Hz would be the test standard.
Audio. All game tests should be performed with audio enabled. We feel testing with audio enabled better represents the way consumers play games.