DirectMusic For The Masses

Like other SDKs from Microsoft, DirectMusic tries to cover many bases, not all of them related to games. Most observers agree that it has some great solutions for background music on web sites. Are DirectMusic's approaches relevant to game development? This article is an overview of what DirectMusic is -- and what it isn't -- to help you get some idea of whether it fits your needs.

DirectMusic is a complete overhaul of the way that Windows plays music. It replaces the basic code that Windows applications use to get MIDI data out of a file, through the computer, and to the output device. It's a completely rewritten and rethought system, all the way down to what noises come out and how. Has it arrived too late? Now that so many games are using Red Book CD audio and other streaming mechanisms, is DirectMusic enough to make MIDI relevant again to game developers? I think so - no matter how you currently handle music in your games, DirectMusic is definitely worth checking out.

DirectMusic starts out by addressing the major problems of Windows' old MidiOut API, such as shaky timing and limited real-time control. It offers consistent playback of custom sound sets using an open standard, Downloadable Sounds Level 1 (DLS1). On top of that, DirectMusic opens more than one door to achieving adaptive musical scores in games.

Like other SDKs from Microsoft, DirectMusic will try to cover many bases, not all of them related to games. Most observers agree that it has some great solutions for background music on web sites. Are DirectMusic's approaches relevant to game development? Its depth makes for a massive API, with hundreds of pages of documentation. Is it too complex to use on a project with a deadline?

This article is an overview of a big piece of work that is still in alpha, so don't look at it as a review. It's more of a look at what DirectMusic is and what it isn't, to help you get some idea of whether it fits your needs. The SDK should be in beta as you read this, and will be released as part of DirectX 6.1 late in the year.

History: Way Back When Down South...

Back in the late 1980s and early 1990s, there lived in Atlanta, Ga., a team of imaginative, talented music programmers called the Blue Ribbon Soundworks. They made a MIDI sequencer for the Amiga called Bars & Pipes which was so innovative that some people still keep an Amiga around just to run it.

By 1994, Blue Ribbon's main focus was a technology called AudioActive, part of which saw the light of day in music-generating programs such as SuperJam and Audiotracks Pro. AudioActive was an API and toolset that generated MIDI music performances on the fly by using data types called styles and personalities. At AudioActive's heart was a toolset for breaking compositions into their component parts and an engine for putting them back together.

To design this system, Blue Ribbon examined the way that real performers in various musical genres make the decisions that affect the progress of a piece. The system bore some conceptual similarity to musical Markov chains, in which each note has a weighted probability of going to each other note. But AudioActive was quite a bit more complex, subtle, and, in true musician form, more subjective. In some cases, it was able to create very convincing performances.

From Atlanta to Redmond

This pedigree was the first thing that I, and many other developers, heard about Microsoft's new music system. It made us skeptical from the outset. Microsoft's developer-hype literature still emphasizes DirectMusic's real-time music generation aspects to such a degree that it looks like AudioActive: The Sequel. Despite automatic music's enormous gee-whiz factor for us computer music types, I couldn't help but feel that its real-world use would simply be one more way for a skinflint producer to avoid paying for professional composition.

But over the past three years, plenty has happened. The trademark "AudioActive" is now used for an MPEG-2 audio player from Germany. Blue Ribbon Soundworks was purchased by Microsoft in late 1995, and its principals moved from Atlanta to Redmond. Its development lead, Todor Fay, continues to spend many days in trade group meetings small and large, listening to what people who make and support music for interactive products want and discussing his team's ideas.

Fay apparently doesn't like to say no. DirectMusic incorporates a truly frightening number of features, including some of the features that developers have been asking for in a music API. It includes an evolved, 100-percent rewritten version of what used to be AudioActive. It also includes hooks for replacing, adding, or modifying any component in the entire system with whatever music generator or filter you or a third party might come up with on your own. It gives applications access to MIDI and other control data in real time.

The fact remains that this open architecture was written with a certain approach to adaptive music at its core. It's an odd thing to find in a Microsoft API: a highly involved music recombiner and regenerator - neat thing, but not the right solution for everybody. Sometimes, in poking through the SDK with a different need in mind, a developer will be mystified and frustrated at some of the approaches and some of the omissions. However, DirectMusic does try to offer solutions for those who don't wish to use the System Formerly Known as AudioActive. Thus far, Microsoft's publicists have done themselves and developers a disservice by giving the impression that the interactive music engine is the core reason to look at DirectMusic. This is definitely not true.

In its alpha stage, DirectMusic is such a big package that many people's first impression is that it's just too complex to use on a typical project. One of the things I set out to do in researching this article was to see if there were reasonably simple paths to solutions for common problems buried in the over 300 pages of API documentation. Microsoft needs to do this if they want to sell DirectMusic to the game development community; as DirectMusic approaches beta (early July), the company is rewriting the documentation with this approach in mind.

DirectMusic's Innards

DirectMusic's headlines for most people who make games are DLS support for hardware acceleration and MIDI with over a million channels and rock-solid timing. It's a big package, consisting of several major parts that operate on different levels. It's not necessary to use or even understand all of the components to make good use of parts you need.

For starters, DirectMusic replaces Windows' MidiOut technology with a new model. DirectMusic's MIDI support has subsample timing accuracy, allows flexible selection of output ports (including third-party creations), and lets applications inspect, filter, and modify MIDI data as it comes out. The release version will also multiply MIDI 1.0's 16 channels by a healthy 65,536, for a total of 1,048,576 discreet channels (called pChannels within DirectMusic).

The biggest single claim that DirectMusic has to making MIDI relevant again is its support for DLS. According to its developers, the bundled Microsoft Software Synthesizer was using 0.12 percent of the CPU per voice on a Pentium II 266MHz MMX as of late June. These numbers will get a bit worse when reverberation is added (reverberation wasn't included in the API as of this writing, but is scheduled to happen before final release). Under the Win32 Driver Model in Windows 98 and Windows NT, this is open to hardware acceleration by PCI-bus sound cards.

DirectMusic includes a Roland-made General MIDI/GS sound set. However, the really great thing about DLS is that it opens up MIDI in games to a variety of techniques for using samplers that electronic musicians have built up over the years. These range from basic wavetable-style techniques (but with any choice of sample data) to sampling entire musical phrases and triggering them via MIDI commands.

If MIDI is a dirty word for many game developers, it's not because of MIDI itself, which is simply a control mechanism and has no intrinsic sonic quality, good or bad. It's because of the inconsistent, usually low quality, fixed sample sets in the built-in synthesizer ROMs on most sound cards. DLS lets MIDI go back to being a timing, control, and note-triggering mechanism, as opposed to being a synonym for crappy-sounding game music . When MIDI is freed up to do its stuff, it can provide the granularity, malleability, and reaction time needed to make music react to what goes on in an interactive world.

As I mentioned, DirectMusic was conceptually built up from its specialized music-digesting system, the most controversial and confusing part of the SDK. This system is good at its original purpose, but that's not the whole story. It has a big side-benefit: a way to play and control segments of MIDI data, apply tempo maps and data filters, and concatenate them into other segments at musically-appropriate junctures (Figure 1). APIs such as the Miles Sound System, HMI's Sound Operating System, and DiamondWare's STK have already been doing this sort of thing (and more) under Windows despite MidiOut's limitations. All of these SDKs' developers are likely to be able to do more interesting things more reliably under DirectMusic.

Figure 1: DirectMusic Screen Shot

Segments, Tracks and Tools


Figure 2: Track Segment Structure


DirectMusic's essential playback unit is the track. Tracks are contained inside segments (Figure 2). Typical examples of tracks and segments would include:

  • an imported MIDI file, which the segment object splits into three tracks containing notes, tempo, and time signature information     (Figure 3);
  • a style playback segment that points to one or more styles, which are compositions that have been abstracted to some level and can change based on real-time input;
  • groove, chord map, and signpost segments that the interactive music engine can use to generate style playback segments;
  • special-purpose segments, such as a mute segment, that can automate playback by turning channels on and off.


Figure 3: MIDI Segment Structure


For those who wish to do complex things with music that can't be done with the built-in generation system, DirectMusic is built to be extended. For starters, tracks and segments are an extensible data type. Because they are the core playback unit, they will let Microsoft and third-party vendors address any fundamental complaints from developers.

DirectMusic also incorporates objects called Tools, which are intended to be easy for developers or third parties to write. These sit in what's called a tool graph, which makes all tools present cooperate with one another. A tool can operate on just one logical chunk of music (a segment) or can process the entire output. If DirectMusic catches on, expect to see scads of tools written to plug its holes, such as a MIDI channel and note mute mask, a MIDI echo, a velocity modifier, a quantizer/dequantizer, and so on.

For hardware vendors who want to extend the API to include new capabilities, DirectMusic provides a mechanism called the property set. Each of these is tied to a Global Unique ID (GUID), and each gets its own index of individual properties, indexed from 0. A given attribute index for a given GUID is always the same. For example, let's say that a developer has built an interface and drivers to hook a real siren to the parallel port. In order to integrate the device's API into DirectMusic, the developer would publish the GUID of the "DirectSiren," along with its indexed property set. An application supporting DirectSiren could then use DirectMusic's IKsPropertySet interface to see whether or not the DirectSiren's DeafeningAirRaid property is available.

Programming: A Smorgasbord of COM objects

DirectMusic consists of 24 distinct COM objects. This lets developers use only the portions they need. For example, if you just want MIDI output, you don't need to incur the overhead of DLS or the learning curve of any interactive music code.

It also means that developers can replace entire sections of the system with ones that meet their needs. The idea is to make an architecture robust enough that third-party vendors of related products and tools will have a much easier time, and won't need to reinvent the wheel in order to support the code they really want to provide. For example, Headspace is making a version of its web-based music player/generator Beatnik that integrates the DirectMusic API.

The DMS Loader

At DirectMusic's technological core lies the Loader, responsible for locating, loading, and registering objects. It was designed with low-bandwidth applications in mind, so it strives for efficiency.

To use the Loader, generally the first step is to set a search directory. This isn't required; an object can be referenced by full path name. URLs are not yet supported. Once a search directory is set, the Loader can search for objects using its ScanDirectory method and enumerate them in a database of their names and GUIDs.

The Loader's caching system relies upon this database: if an application asks for the same object twice (even in different locations), and if that object is in the database, it doesn't need to be loaded a second time. Caching is enabled for all objects by default, but can be turned off and on with the Loader's EnableCache method. For a balance between conserving RAM and avoiding repeated loads of the same object, an application must make smart use of the CacheObject, ReleaseObject, and EnableCache methods. Of course, there will be cases where caching is not good - browsing through tons of instruments in a DLS editing application, for example.

Once this database exists, an application can use the Loader's EnumObject method to show all objects of any class or classes in the database and then make an instance of the object (without duplicating data) using the GetObject method.

Output API: Instruments and Ports

The basic means by which DirectMusic makes actual sounds come out of the digital-to-analog converters of a game player's PC is DLS. The API represents DLS instruments with the DirectMusicInstrument object, and sets of instruments (Collections and Bands) with the DirectMusicCollection and DirectMusicBand objects. The way any of this gets out of the box is via a Port object.

To use DLS in an application, first you must have one or more files full of DLS instruments, comprising both sample data and associated control (articulation) data. This industry-standard (not just Microsoft's) file type is known as a DLS Collection. As a simple enhancement to General MIDI, you can use the General MIDI/GS DLS collection bundled with DirectMusic. This way, all of your users will hear the same sounds, and you won't play sound-card roulette.

Of course, using the stock GS set sort of misses the point of using custom sounds. The better way is to have your crack team of composers and sound designers deliver custom DLS collections comprising instruments specifically developed to go along with your game's music. Sound designers can also supply DirectMusic bands, which are detailed references to DLS data in one or more collections (Figure 4).


Figure 4: DLS Device: Band / Instrument Relationships

Basic Playback with the Performance API

DirectMusic's playback objects include Port, Performance, Track, and Segment. The Performance object is the music playback überobject. It adds and removes Ports, downloads Instruments, attaches graphs of Tools, deals with event notification, and plays Segments. Segment objects contain data in one or more Tracks, which is where the actual music data resides.

A DirectMusic Track is not the same thing as a track within a type 1 MIDI file. In fact, a DirectMusic Segment can contain all of the data from an entire imported MIDI file.

About the simplest thing an application can do with DirectMusic is to create a Performance, create a Loader, and tell it to load a single MIDI file. The Loader returns the MIDI file in the form of a Segment. To play the Segment, call the Performance's PlaySegment method.

While it's possible simply to play a MIDI file without invoking any more of the Performance API than I just described, an application can easily bring to bear more of DirectMusic's control mechanisms. For example, you could create a Performance containing Segments, each comprising a single, complete MIDI file. Using the Performance API, an application can then queue, layer, and modify these Segments. Each Segment can get a delay value when queued for playback; this value can then adjust itself to match tempo changes. The PlaySegment method also accepts a parameter telling it what type of rhythmic juncture - beat, bar, note, and so on - to jump in on. By playing multiple Segments simultaneously, an application can add and subtract musical elements.

The next logical item on many game music composers' current wish lists is a way to manage segment playback based on game state inputs using an authoring-level scripting scheme or something similar. Using a set of variables shared between the music engine and the host application, this type of system could emulate what a music editor does for a film: watch what's going on and select, mix, and match existing musical elements accordingly.

High-level scripting isn't part of DirectMusic. Some developers think this should have been the fundamental thrust of any music system from Microsoft, and that DirectMusic misses the point. Its authoring-level logic doesn't go beyond the single-segment level; to do more requires application code.

On the other hand, I can't see anything in DirectMusic's architecture that would preclude a higher-level system for real-time rendered music editing. Its low-level code should make this sort of thing easier and more reliable than it was under the old MidiOut system.


DirectMusic's sharp timing comes courtesy of its core layer. This layer also supports the software synthesizer and other DLS-related services. It supports buffered, time-stamped MIDI input and output, letting the system do things such as play multiple sequences with completely independent timing. Normally, DirectMusic itself sequences the MIDI data, but others can write their own sequencers and plug them into DirectMusic. All of the higher-level stuff - loading and playing files and the interactive music engine - is part of the performance layer.

Composing with DirectMusic Producer

DirectMusic has two audiences that it must please: audio creative types and programmers. The face of DirectMusic for a musician or sound designer is an application called DirectMusic Producer. A tutorial, or even a decently comprehensive review, should be the subject of another full feature article once the tool is complete.

Producer's nature reveals itself with the "Insert File into Project" dialog, when it shows a list of the sorts of things it can open. These include all of DirectMusic's editable data types: Bands, DLS Collections, Chord Maps, Templates, Segments, and Styles. Each of these existing data types has its own interface within Producer (Figure 5). These almost behave as their own applications, except that they can pass data back and forth and can be built into unified projects.

Figure 5: DirectMusic's Producer Tool

One great thing about Producer is that it comes with an API that lets developers build new editing tools into the application. This means that if, for example, a vendor wanted to make its algorithmic music generator available as a DirectMusic component, it could build the editor right into the Producer, allowing it to talk with other components such as the DLS editor.

How Deep Do You Want To Go Today?

DirectMusic's Interactive Music engine can be used to varying degrees. The deepest levels are only going to be of interest to a few developers, as they get into rather specialized solutions. The simplest level should be of interest to plenty of developers: just import one or more MIDI files, each as a segment, and thus make them available to the API for queuing and scheduling.

The next level gets into a data type called a Style. Styles contain patterns (Figure 6), which are like MIDI sequence files in that they contain one or more parts, each with a single instrument, that can be set to play with no randomness or variation. So, conceptually, the simplest style is just like a MIDI file.

Figure 6: Pattern Screen Shot

Going one step deeper, you can add variations to individual parts (Figure 7). Typically, these are made by copying the contents of a part and adding or subtracting notes to change its feel and density. You can tie together variations in different parts within a style, so that they encompass more than one instrument. These variations can then be chosen either by the game's code or by parameters set within the music engine.

Figure 7: Adding a Part Variation...

A more basic parameter for selecting patterns is a number ranging from 1 to 100 called Groove Level. Groove Level can derive from a Groove Track, but it can also be set by your game based upon state variables. The more intense the state of things, the higher the Groove Level. This lets DirectMusic choose patterns based upon the groove ranges assigned to them by the composer.

More than one pattern can play at once. Unless a secondary pattern has tempo data associated with it, it will take its timing from the main pattern. A specialized type of pattern, called a motif, is intended to be triggered by events. Motifs generally consist of only one or two instruments and are short. The simplest example might be a single drum hit.

Up to this level, no actual notes are being generated or even bent by the music engine. It's simply been storing, playing, and combining musical elements that were fully composed by a human being. The most extensive use of this engine in a game to date, Monolith's SHOGO - MOBILE ARMOR DIVISION, went no deeper than this.

The next step, if you take it, starts automatically transposing some notes. This involves a segment track type called a chord progression (Figure 8). To use one, abstract the chord changes from your piece of music and use DirectMusic Producer to place them in a chord progression track within the segment. On playback, the style engine recreates the proper notes by mapping the notes in the style to the harmonic information within the chords. Each chord supports up to four subchords, called levels.

Figure 8: Chord Progression Screen Shot

Using chord progressions requires building in a bit more information into your parts and variations. Various attributes can be set at levels ranging down to the individual note to determine whether or not a musical element can be transposed by a chord, and if so how (Figure 9). Variations can be set to play only at certain scale positions or junctures. Parts can be assigned to different levels within the chords.

Figure 9: Note Attributes Screen Shot

Beyond this, DirectMusic includes templates and chord maps (formerly called personalities), which the composition engine can use to automatically generate segments (Figure 10). A template is a segment that has everything for style playback except the chord progression track. Instead, the template includes a sign post track, which defines a road map for how to place chords in the chord progression. A separate file known as a chord map defines the actual chords as well as rules for mapping them to the sign post track. These include sign post chord definitions and a tree graph of chord connections. The composer who creates the template can build in weights for the probability of choosing one chord over another at a given juncture in the music.


Figure 10: Template / Segment Relationship

The composition engine combines the chord map and template to create a style playback segment with the resulting chord progression, along with the template's groove track and style playback track. By combining different chord maps and styles with a single template, an application freshly composes musical variations for each scene.

And then there are shapes, which actually generate templates: "Give me forty bars of music that rise, and then a snappy 12-bar coda." It's a bit more involved than that, but you get the idea.

By the time you're using shapes, the chord progression is truly generative, but the original composition work that went into the style still peeks through. For more information on templates and shapes, see the documentation on Microsoft's DirectX web site.

Templates and shapes can create style playback segments offline, for example during a game level load. This won't interrupt whatever playback DirectMusic is up to at the time.

The Future

A common thought about DirectMusic is that it should merge with DirectSound, combining the APIs' strengths and addressing their weaknesses. According to Kevin Bachus, DirectMusic's product manager, "DirectSound and DirectMusic are really siblings in the same audio organization. Seamless integration of DirectSound and DirectMusic is very important. Expect to see more and more convergence in future versions of DirectX."

DirectMusic's DLS output can benefit from DirectSound-specific features such as spatial positioning (a.k.a. 3D sound). In turn, DLS offers control over sound effects from within authoring tools, letting sound designers take some of the sound effects implementation out of the hands of programmers. It can also offer features of which DirectSound has no inkling - such as envelopes and midsample looping - which can be completely set up by a sound designer and triggered using standard MIDI commands.

As I mentioned, this is a big package. It attempts a great many things, and based on the alpha version, it seems to do many of them well. It doesn't include some things that many developers wanted, especially a track-based, scriptable system for controlling adaptive music playback. However, what it offers is closer to solving some of the same problems than might seem evident at first glance.

So, what is DirectMusic good for? Who is it good for? For those game developers who continue to use MIDI as their games' music output, the basic architectural improvements are long overdue. For those who can realize their music well using custom DLS1 sound sets and can afford some CPU hit on unaccelerated user machines, DLS will be a relief from being stuck with inconsistent playback and General MIDI's limited palette of sounds. If you have plenty of disc space, don't need run-time access to the CD drive, want dirt-simple programming, and don't care about adaptivity in your music, Red Book remains the way to go, even though DLS can do full CD quality (44KHz 16-bit stereo).

If you want to do adaptive music, first analyze what you want to happen. Perhaps videotape some game play and score music to this using traditional techniques. Once you've figured out how music should ideally operate in your game, look at both the controls your code needs to spit out and what the music engine needs to do in response. Once you've defined the task to this degree, you may or may not find your solution within DirectMusic. Take a look at third-party APIs such as Miles or DiamondWare; you're likely to find the codebase you need without writing it all yourself.

If you want an adaptive digital audio streaming engine, try doing this with large samples under DLS. With the software Microsoft has supplied as of this writing, I can't judge whether this will work well or not.

Will DirectMusic make MIDI relevant again for you? What am I, your mother? Set aside a day and check it out. If nothing else, it's brain candy. Enjoy.

Latest Jobs

Manticore Games

San Mateo, California
Senior Software Engineer - Mobile

Sony PlayStation

San Diego, California
Sr. Online Programmer

The Walt Disney Company

Glendale, California
Associate Marketing Manager - Walt Disney Games

Insomniac Games

Burbank, California
Accessibility Design Researcher
More Jobs   


Explore the
Subscribe to
Follow us

Game Developer Job Board

Game Developer Newsletter


Explore the

Game Developer Job Board

Browse open positions across the game industry or recruit new talent for your studio

Subscribe to

Game Developer Newsletter

Get daily Game Developer top stories every morning straight into your inbox

Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more