Optimizing CD-ROM Performance Under DOS/4GW
You're building the coolest, most realistic game with a half hour of gorgeous full-screen video and a phenomenal soundtrack featuring famous voices...there's just one problem...flicker...flicker...
June 1, 1997
Author: by Dan Teven
You're building the coolest, most realistic game with a half hour of gorgeous full-screen video and a phenomenal soundtrack featuring famous voices. You're using the Watcom compiler, with DOS/4GW, because you want the game to run on both DOS and Windows machines. There's just one problem: the video segments don't play smoothly on anything less than a 6X CD-ROM drive.
Let's look at some CD-ROM quirks you should know about, no matter which PC platform you're targeting, and how to read a CD-ROM drive efficiently from a DOS/4GW program. Then we'll discuss two games that exemplify intelligent use of a double-speed (2X) CD-ROM drive: Rebel Assault II: The Hidden Empire from LucasArts Entertainment, and Loadstar: The Legend of Tully Bodine from Rocket Science.
CD-ROM Fundamentals
In most games that read data off a CD-ROM in a continuous stream, the CD-ROM drive's data transfer rate will be the biggest bottleneck. Today, single-speed drives are obsolete, and double-speed (2X) drives are the norm. A 2X drive, at best, can read 300K of data per second; a quad-speed (4X) drive, 600K. Considering that movie-quality video runs at 30 frames per second and 30 frames of uncompressed, 640-by-480 pixel, 16-bits-of-color video plus one second of CD-quality audio require 18,176K per second, today's CD-ROM drives are woefully inadequate for movie playback. To beat the bandwidth problem, you'll have to cut a few corners.
To reduce your game's data rate to sustainable levels, you must sacrifice some quality. For example:
Compress the data.
Play fewer video frames per second.
Reduce the displayed resolution (perhaps to 320-by-240 pixels or fewer).
Use only 8 bits of color per pixel.
Use lower-quality audio.
Some games copy data to the hard disk, which is capable of a higher (but still limiting) transfer rate. This technique works well for small amounts of frequently accessed data, but it increases the game's hard-disk space requirements and slows down the installation.
Even if your data rate is sustainably low, you must ensure that it's steady. Otherwise, your video will skip, and your sound will stutter. If you want to display 15 frames of video per second, you either have to read and process each frame in less than 1/15 of a second or use a buffer to smooth out spikes in the data rate.
If you read non-sequentially from the disc, the data rate will drop to zero whenever the laser moves to a new location. In designing your game, you must somehow work around this problem. It's possible to cover up the seeks by playing previously buffered sound and video, but you could also display a static screen (text, for example) to distract the player.
CPU and the MPC Spec
You can guess the data rate of a drive just by knowing if it's rated 2X, 4X, or 6X. Most 2X drives can deliver 300 ± 25K, most 4X drives can deliver 600 ± 50K, and so on. However, a few drives are optimized for a particular access pattern: they live up to their ratings when you use DOS copy on them, but fail miserably under the more demanding conditions of gameplay. We've also seen drives that didn't live up to their rating because they (or their controller cards) were defective.
Data rate isn't the only consideration in CD-ROM programming. After reading the data, you need enough CPU cycles left over to decompress it, display the video, play the audio, and execute your game's logic. Unfortunately, the percentage of CPU cycles consumed varies among CD-ROM models, largely due to the efficiency of the device drivers. Some move the data from the drive's hardware buffer to system memory using programmed I/O, while others use direct memory access (DMA). Some drivers transfer data a byte at a time, while others wait for an interrupt that signals a full sector is ready to be transferred.
Since DOS is inherently single-threaded, drivers monopolize the CPU even when they are simply waiting for a hardware event. Both the "waiting" algorithm and the "transfer" algorithm affect how the CD-ROM will respond to your efforts to reclaim spare CPU time. A computer that meets the MPC II specification is guaranteed to deliver 150K with at most a 40% CPU load, while an MPC III computer (running DOS) can deliver 550K at 40%. But 150K is too slow for most applications, and MPC III systems are not yet common. Further, a drive's rated CPU load is a laboratory average. If you take over the CPU at times which are inconvenient for the CD-ROM driver, your mileage may vary. Thus, for most uses, the MPC specification serves only as a vague indicator of probable system performance.
Expected Variation
The variance in CPU load is greatest among 2X drives. The best drives approach 300K using 10% of the CPU; the worst suck down 90% of the CPU. A game designed for a full 300K data rate will starve to death on the poorer 2X drives because it won't have any processor time to spend on decompression, output, and game logic. The practical limit for a game intended to run on a 2X drive is around 260K, while 225K is common. CPU load is more consistent among 4X systems. If your game needs a 4X drive, you can probably assume the performance guaranteed by MPC III.
We're not intending to embarrass drive manufacturers in this article, but Mitsumi deserves special recognition for selling a ton of 2X drives with truly awful device drivers. More than one best-selling game contains special workarounds for Mitsumi drives. The drives are O.K., but the MTMCDAS.SYS driver is truly brain-damaged. It synchronizes itself to the drive with delay loops, and it doesn't check the drive status within the loops. If you interrupt a delay loop (for example, because you're doing preemptive multithreading), the driver still ties up the CPU by counting down to zero after you switch back to it-even though the drive could already be done with its operation. The MTMCDAE.SYS driver is only slightly better: it synchronizes itself to the drive with an interrupt, but its interrupt handler leaves interrupts disabled for so long that other interrupts (timer, sound card, threading) get missed.
Variation in Seek Times
If you want to avoid a noticeable delay when you open a different file or jump around within a file, you should buffer up at least a half second of sound and video to be played while the seek takes place. MPC II guarantees an average seek time of 400ms or less; MPC III guarantees 400ms for a notebook computer, 250ms for a desktop. Your code should anticipate seeks that take longer than these averages.
Seek times generally increase in proportion to the distance the laser has to move. Ironically, the worst seek times we've seen on any 2X drive (up to 1 second!) were for seeks of less than 256 sectors, because that particular drive used a different algorithm for very short seeks. Some CD-ROM drives defer seeks until they're forced to actually read data. This trick makes for faster servers, but it can cause delays in your game if you're not prepared for it.
Programming Interfaces
Now that you know what you're up against, it's time to get down to programming. There are up to four different layers of system software between you and the CD-ROM drive: DOS, MSCDEX, the MS-DOS device driver provided by the drive manufacturer, and possibly a lower-level driver (for instance, a SCSI).
DOS affords the simplest programming interface and one you already know how to use. You can open a file on the CD-ROM drive with a call to fopen() and perform random-access reads by combining calls to fseek() and fread(). Although DOS is a real-mode interface, the protected-to-real-mode translation is done automatically by DOS/4GW.
DOS adds a little overhead to every CD-ROM request, and the DOS extender adds a little more. DOS/4GW has to copy the data read by fread() into extended memory, and it will break any read of more than 8K into multiple calls to DOS. You can't eliminate the DOS overhead. You can eliminate DOS/4GW's by allocating your own buffer in low memory and associating that buffer with the file pointer for the CD-ROM, like this:
FILE *fp; // pointer to the file on CD-ROM
RealPtr p; // see listing for RealPtr, AllocateLow
p = AllocateLow (16 * 1024); // allocate 16K buffer in DOS memory
setvbuf (fp, p.ptr, _IOFBF, 16 * 1024);
The biggest remaining problem with this scheme is that DOS reads are synchronous; your program will have to sit and wait for the read to complete. Even if the CD-ROM drive has a low CPU load, your program won't be able to reclaim those spare CPU cycles for anything except its interrupt handlers.
The MSCDEX Interface
Since DOS calls MSCDEX when it's asked to read from or seek on a CD-ROM drive, you might ask how the MSCDEX interface is different. It's still a real-mode, synchronous interface, but you ask MSCDEX to read with INT 2Fh/AX=1508h instead of INT 21h/AH=3Fh. More significantly, MSCDEX deals with 2,048-byte sectors, not files. When you open a file in DOS, DOS calls MSCDEX to return the directory entry structure for the path you specify. This structure contains the sector location and size of the file. If you know this information beforehand, you can avoid rereading the directory entry (and eliminate an unnecessary seek or two). MSCDEX always reads entire sectors, so you must do your own buffering if your request isn't sector-aligned.
Driver Level and Below
The next rung down on the ladder is the MS-DOS device driver, which is called by MSCDEX according to a very standard protocol. Even though a driver may be implemented in an idiosyncratic way, certain functions are required if the driver is to work with DOS. Hence, the device driver interface is the lowest level at which you can interact with a CD-ROM drive without knowing what kind of drive it is. Unfortunately, it's yet another real-mode, synchronous interface.
The device driver entry points are a pair of real-mode functions in the same code segment as the device driver header. The offsets to those functions are contained in the header. You make a far call to the first function, called the strategy routine, passing the address of a low memory data block in ES:BX. You then make a far call to the second function, known as the interrupt routine, and the driver performs the requested operation. It's a little tricky to set all this up from a DOS/4GW program, but MSCDEX provides a shortcut: put the data block address in ES:BX, put a drive identifier in CX, and issue real-mode INT 2Fh/AX=1510h. You can find an example of this technique within the source code.
By talking directly to the driver, you can bypass any disk caches and get the most consistent performance with the least overhead. You can also confuse the higher-level system software, so your game may not run correctly in a multitasking environment. If you decide to program to this level, it's a good idea to eject the disc at the end of your program to reset the state of the drive.
There's little to be gained from going below the MS-DOS device driver interface. Some device drivers might support asynchronous operation; some might even be callable from protected mode. Each one is different, and you'd have to support them all to have a game worth selling commercially.
Synchronous APIs
We've examined the sensible programming interfaces for CD-ROMs and found that, unfortunately, none of the choices is asynchronous. How, then, do we reclaim the spare CPU cycles, which are currently being used to turbocharge the wait loops in our device driver? On some drives, we can improve the situation considerably by reading small blocks from the disc on a regular basis, interleaving the reads with other work so the CD-ROM drive's internal buffer has time to refill before the next read. In fact, this technique is the key to getting reasonable throughput from the MTMCDAS driver. You can try varying the size of the blocks and the length of the delay between them to find the sweet spot of a drive, but you may not need to if you aren't after the highest possible data rate.
Not all drives perform well with small reads. To maximize throughput on most drives and reclaim seek time, we need preemptive multithreading. Preemptive multithreading is the only way to reclaim the CPU (to, say, render a frame) during one of those synchronous calls into the CD-ROM driver.
The implementation of a multithreading system for DOS/4GW is beyond the scope of this article, but commercial libraries are available. Different drives respond differently to different access patterns, so you'll need to experiment with the thread duty cycle (percentage of time given to the CD-ROM driver), thread switch frequency (size of each time slice), and the size of the blocks and the length of the delay. Keep in mind that many CD-ROM device drivers are not reentrant, so only one thread in your program should access the CD-ROM.
Case Study: Rebel Assault II
Rebel Assault II: The Hidden Empire from LucasArts is the sequel to the action-arcade game Rebel Assault. Set in the Star Wars universe, it features 15 chapters of play and uses high-quality cinematic video sequences to advance the story and mood. The game play features various flying, dodging, and shooting sequences set in front of interactive streamed backgrounds.
The minimum platform for Rebel II is a 486/50 with a 2X CD-ROM drive. To achieve acceptable performance and image quality on this platform, LucasArts wrote a custom animation system. This system, the INteractive Streamed ANimation Engine (INSANE), is a collection of code libraries designed primarily to compress and play back video sequences. The system is modular, easily portable, and will be used in a majority of LucasArts's upcoming titles. In Rebel Assault II, noninteractive sequences are 320-by-200 pixels, while interactive sequences are rendered in 424-by-260 resolution. Both use 8-bit, 256-color imagery and appear full screen. For higher-end machines, optional interpolation up to 640-by-400 resolution is available. High resolution is more CPU-intensive, so this may result in a slower frame rate than low resolution, even on a moderately-powered system. To account for this, the system was designed to elegantly handle a less-than-optimal frame rate.
Each frame of video typically consists of 13K of video and 2K of audio. With a data rate of 225K per second, this allows a frame rate of 15fps. Due to the large quantity of video generated for the game, it would have been unreasonable to generate multiple copies of the video streams, each running at a different frame rate. Instead, all video sequences are designed to run at the machine's maximum speed, capping the rate at an optimal 15 frames per second. For high-end systems, the extra CPU time can be used to run in high resolution.
To account for possible synchronization problems due to variable frame rates, two approaches were taken. For sequences without onscreen speech, music and sound effects are linked to specific key frames and designed to accomodate up to a 15% variance in frame rate. For sequences with on-screen speech, rigid synchronization is used. For these sequences, every other frame of video can be optionally omitted, saving decompression and display time and allowing the animation engine to catch up to lip-synched audio.
For some interactive sequences, smooth branching must occur. To achieve this, the system allows video segments to be interlaced into the data stream and preloaded before a possible branch point. When the branch point is reached, the preloaded segment is played to cover up the seek delay to the new animation.
The INSANE library performs reads through DOS for portability. To achieve smooth, uninterrupted animation, it uses a hybrid preemptive cooperative multitasking system, in which data reads are performed within a mainline DOS thread; decompression and game logic run in time slices granted via the timer interrupt. Decompression time can vary from frame to frame depending on the layers of imagery and compression options used in a particular frame. To achieve best overall performance on all video sequences, the system dynamically varies both CPU time-slice allocation and decompression frame rate based on CD-ROM read performance and decompression time.
Case Study: Loadstar
Rocket Science's game Loadstar: The Legend of Tully Bodine is a fast-paced arcade shooter set against a movie backdrop. Loadstar has received accolades for its speed and production values. Much of the action takes place within a network of tracks on the surface of the moon, and players must simultaneously navigate this maze, avoid damage to their ships, conserve power for shields, and shoot down attacking gunships.
The programming of Loadstar was guided by the following principles:
The game should be playable on any 486/25 with a 2X drive but take full advantage of faster machines.
It should play back as rapidly and smoothly as possible, without the player having to tweak it for his particular machine.
The video images are 320-by-200 pixels, with a 256-color palette. For best color reproduction, the palette is updated on every frame.
The narrative segments must play back at 24 frames per second, for smoothness. Some frame dropping is acceptable.
The interactive segments (which are more expensive because of sprites and sound effects) should play back at 24fps if possible, with 12fps being the minimum acceptable rate.
Video and audio must be perfectly synchronized.
Even though every branch in the maze represents a jump to a new movie segment on the disc, there must be no delays no matter which direction the player goes.
To accomplish these ambitious goals, Rocket Science developed a game compiler that ensures related data is grouped close together on the master CD-ROM, that the data rate required throughout the game is known, and that the data rate stays almost constant. Rocket Science also spent considerable engineering effort figuring out the fastest way to read an arbitrary CD-ROM drive.
Loadstar profiles the machine it's running on and scales the size of the video rectangle, the quality of the sound effects, and several aspects of gameplay to ensure that the game will play smoothly. On machines with very slow CD-ROM drives, it will select 12fps data streams instead of the usual 24fps streams. It always bypasses DOS and MSCDEX because there's no room for overhead when your data rate is 260K.
Once it figures out the optimal access pattern for a drive, Loadstar uses a background thread to read data into a big queue. The foreground thread empties that queue, decompresses the data, superimposes sprites, and copies the composited data to another queue in video memory. The balance of time between the threads is adjusted dynamically, based upon the amount of data in both queues.
An interrupt handler synchronized with vertical nondisplay empties the video frame queue and updates the palette. If a frame is ready before its predetermined time, the extra time is given to the CD-ROM drive. If a frame is ready late, it's displayed as soon as possible, but the next frame is thrown away to give the game a chance to catch up.
The background thread is also responsible for seeks. Rocket Science's game compiler arranges the data stream on the CD so the first half second of data for every possible branch is read into memory before the branch actually takes place. Then, no matter which branch the player takes, the action continues seamlessly until the laser reaches its new location and a new block of data gets read.
The Need for Speed
Now you know how to make efficient use of a CD-ROM. Remember: the easiest way to speed up some models is to speed up everything else. Getting a decent data rate out of the Mitsumi FX001D, for example, forces you to give up at least 70% of your CPU cycles. If the rest of your game needs only 30% of the power of a 486/25 or scales across a range of processors, your job will be easier.
Protected-mode, multithread-aware CD-ROM drivers are another advantage Windows 95 and OS/2 have over DOS. The MPC III specification guarantees 550K at just 7% of the CPU on those operating systems. We look forward to the day you can design a CD-ROM game without worrying too much about the low end of drive performance.
Vincent Lee is a project leader and designer at LucasArts Entertainment. He was project leader and lead programmer for Rebel Assault and Rebel Assault II.
Dan Teven specializes in 32-bit systems programming for extended DOS and Windows 95. He has consulted on threading and CD-ROM issues for numerous projects, including Loadstar.
Read more about:
FeaturesYou May Also Like