Sponsored By

Run-Time Pixel Format Conversion

Today's programmers have to deal a variety of graphic boards and resolutions. Learn how to code for the multitude of formats and convert images from one pixel depth to another. DirectX, Windows GDI, true color, high color and Alpha channels -- it's all in here.

Michal Bacik, Blogger

May 21, 1999

32 Min Read

Today’s graphics chips vary widely in performance and visual quality. As a result, computer users come into contact with varying graphics formats, resolutions, and color spaces. If developers want their applications to succeed in this situation, they must prepare these applications to work on a wide variety of hardware. Although high-level technologies such as DirectX simplify the process of writing portable software, the programmer still faces the problem of writing for different hardware. DirectX can indicate the capabilities of the underlying hardware, but in certain cases, the API doesn’t perform emulation (such as when an application blits among different surface formats). Programmers need to know how to handle the graphics formats that are available for the hardware on which their programs are supposed to run. In this article, we’ll examine how to prepare graphics data in various formats so that it will display properly in a game.

Often, the application needs to convert graphics data from a storage format on the hard disc to another format that’s usable by the game itself. To make matters worse, background pictures, sprites, and textures in 3D games may contain an alpha channel, which doesn’t simplify the problem. In order to display all graphics properly, a game must be ready to handle conversions to whatever pixel format the current hardware is using. How can we achieve these conversions? Many programmers beginning to work on the PC platform tend to use Windows GDI functions for pixel format conversions. Unfortunately, this strategy has a few limitations. GDI functions don’t support formats with an alpha channel, they’re typically slow, and they simply can’t perform certain conversions.

Many programmers also think that the 5:6:5 ratio (corresponding to the number of bits for red, green, and blue components) is standard for high-color (16-bit) modes. However, these same programmers could be using a high-color mode with a 5:5:5 ratio, any one of several modes with an alpha channel, or some other mode with some other ratio that is specific to a certain hardware vendor.

The key is to avoid the assumption that the order or width of a pixel’s RGBA components is fixed. We also need to determine what color formats the game can handle. The proper programming method is to read the capabilities of underlying hardware and perform tasks that make use of the hardware appropriately.

Looking for a Solution

So what could we do to avoid graphics hardware problems? We could create a C++ class that would satisfy the requirements of those graphics cards currently on the market or due out in the near future. We would need to determine the tasks required during conversion among pixel formats and then write an interface that would handle tasks and hide the implementation. Here are the minimum requirements:

• Perform format conversion of the most common pixel formats.

• Perform the conversion as quickly as possible, because a game can spend significant time loading all the textures and bitmaps for a scene.

• Ensure that the conversion works safely.

Pixel formats differ in many ways, notably the pixel depth. Today’s hardware uses 1-, 2-, 4-, 8-, 16-, 24-, and 32-bit wide pixels. Until games run in 32-bit environment, we’ll be satisfied with a maximum of 32 bits per pixel. Different pixel formats also offer palletized and RGB modes, as well as modes with an alpha channel.

We may want our format conversion class to be able to convert data from any format into any other format. However, it doesn’t need to be that flexible, at least on the source side. Because high-color data and 8-bit rgb formats typically aren’t the formats in which data is stored, things are a little simpler for us. When it comes to destination formats, the converter should be more flexible and able to handle most possible graphics formats. For the purpose of this article, however, we’ll be happy with a limited set of output formats.

At first, our task may seem complicated. How big must the library be? How will it handle all possible graphics formats, including palletized modes and alpha channels, as well as pixels that can be anywhere from 1-bit to 32-bits wide? Well, let’s see how we can simplify things.

First, we don’t need to handle all possible formats separately. Hardware must adhere to certain rules. Only formats that are 8 or fewer bits wide can be palletized. Furthermore, RGBA components always occupy contiguous bits within pixels.

Second, we can limit our converter to handle only RGB components of a maximum width of 8 bits per single component. When hardware offering higher resolutions becomes standard, we can add more features to the class that will accommodate higher pixel depths.

Finally, we can decide not to support graphics formats of fewer than 8 bits because they’re increasingly rare. If we must support these outdated formats, however, we can do so via less efficient general algorithms.

Let’s summarize the information that we need in order to perform pixel format conversion:

• We need to know the pixel format of our source data, including bit depth, RGBA bit masks, and palette (if any).

• We need to know which pixel input and output formats must be supported.

• We need some source data, usually loaded from disk. Because data is usually supplied as a bitmap, we can choose the rectangular memory scheme that is used by most graphics drivers and APIs. We need a pointer to the data, as well as to the width, height, and pitch of the source surface. Using a rectangular memory system isn’t required - it’s possible to convert contiguous memory blocks, RLE-packed sprites, and so on, to our needs. Working with rectangular memory is just another simplification and will be all we need in most cases.

As bitmaps are loaded from disk, the source format is available via the bitmap header. The destination format depends on the data’s final purpose. Usually, the data should match the format of the selected video mode or the chosen texture format. The conversion takes the source file and goes through it pixel by pixel, performing the conversion on each one and storing the results in the destination format.

Functions, Initialization, Work

How should we design the class so that it is flexible and effective? The ideal approach would be to define a pure virtual class as an interface and derive the implementation class from this. The interface class shouldn’t contain any data members, so that we can make changes in the implementation of the converter without having to recompile the existing code. Until the interface changes, we’ll only need to relink the affected program.

Initialization of the converter

We usually use the converter in a many-to-one case, whereby multiple source files stored in various pixel formats are converted to the same destination format. Therefore, we should initialize the class to a certain destination format. For example, we could initialize and keep one converter interface with which we convert all of the data that we want to display on the screen. The initialization includes precalculating whatever values we’ll need for the fastest possible conversions. One common technique is to use look-up tables, which we’ll go into later.

For the purposes of conversion, convenience dictates that we define any pixel format by a simple structure. Let’s declare it following way:


typedef unsigned long dword;

struct S_pixelformat{

dword bits_per_pixel;

dword red_mask, green_mask, blue_mask,
//masks of single RGB components within a pixel

S_palette_color *palette;
//pointer to array of 256 palette entries,
// or NULL for nonpalletized formats


We can declare a single palette entry as

struct S_palette_color { unsigned char red, green, blue, reserved; };

Finally, we can declare the initialization function as

bool C_converter::Init(const S_pixelformat *dest_format) PURE;

The implementation of the converter should save the destination pixel format and extract the additional values that we’ll need for later conversions (such as the positions of components for bit-shifting components into proper position in a pixel). Depending on the destination bit depth, we can create special look-up tables from given bit masks of RGBA components.


Let’s declare the conversion method in the following way:


bool C_converter::Convert(const void *src_data, void *dst_data, size_t sx, size_t sy, size_t src_pitch, size_t dst_pitch, S_pixelformat *src_format, dword flags) const;

At first, everything we might expect from our converter is spelled out in its initialization and the conversion. The class should now perform its task and convert any source format into the specified destination format.

Though sometimes it’s convenient to choose a system memory as a destination for the conversion, sometimes it’s better to write directly into video memory, which spares the redundant copying of data from a temporary buffer to a video card’s memory. In DirectX, for example, we could use Lock on the surface to obtain a pointer to the surface’s location in memory and its pitch, then use this pointer as the conversion destination.


Once the interface is finalized, we need to code the converter itself. In order to see the effect, we should initialize the display mode to the pixel depth of our system so that we can code the implementation, load a bitmap, convert it to the format of a primary surface, and blit it onto our screen. DirectX surfaces offer an ideal primary surface format.

The process involved in loading an image from disk is beyond the scope of this article. However, we could load a Windows device-independent bitmap or implement our own loader for our favorite bitmap format. All that we’d need for conversion would be a temporary buffer containing the uncompressed raw data and a surface into which the converted data will be placed.


A call to the Convert method branches the code into separate code streams. The initial branching is based on the source pixel depth and may look like this:



case 8:
// handle conversion from 8-bit modes here...

case 16:
// handle conversion from 16-bit modes here...

case 24:
// handle conversion from 24-bit modes here...

case 32:
// handle conversion from 32-bit modes here...

Error("unsupported pixel depth");
return false;



Each branch splits again into four branches of destination pixel depth. However, as we’ve already learned, keeping 16-bit data on disk is unusual, so we won’t implement conversion from high-color modes and comment out the branch for now. Also, the difference between 24- and 32-bit modes is only a wasted byte in 32-bit mode, so we could process them in single branch. By way of example, let’s suppose our source data is in 24-bit true-color format.


case 24:
//from true color ...

case 32:
//note that 24- and 32-bit modes
// are processed the same way



case 8:
// handle conversion to 8-bit modes here...

case 16:
// handle conversion to 16-bit modes here...

case 24:
// handle conversion to 24-bit modes here...

case 32:
// handle conversion to 32-bit modes here...

Error("unsupported pixel depth");
return false;




As soon as we’ve written the skeleton of the function, we can start to write first conversion. All combinations not yet supported will return a false value.

So we’ll continue writing code in the branch after label case 16: of the second switch statement. We branch by pixel depth because it’s convenient to use CPU registers of the same depth as the processed pixel format: 8-bit registers for 8-bit modes, 16-bit registers for high-color modes, and so on. This convention also allows us to write a basic, unoptimized converter very quickly and then fine-tune the most frequently used paths later.

As we saw earlier, a rectangular memory system is frequently used for graphics data. In the case of bitmap data, the width, height, and pitch of a surface define the surface area. The pitch doesn’t necessarily equal (width*bits_per_pixel/8). For video-memory surfaces, the video driver may choose to set the pitch wider than the width in order to better utilize the available video memory. We’ll need to keep this in mind as we write the conversion code.

Various Conversion Modes

Let’s look at possible conversion situations that might arise:

Copy mode. If the source and destination pixel formats are the same, no conversion is necessary. We only need to copy data from the source to the destination.

Converting from RGB to RGB. RGB components are shifted within a pixel and may be expanded or contracted. If the depth of the destination format is less than the depth of the source format, we’ll lose color information. In such cases, a simple dithering algorithm improves the quality of the results after conversion. We’ll cover dithering later.

Converting from palletized to RGB. The source data consists of indices to a palette. The palette is chosen from the true-color space, so we might lose some color information when converting to modes less than 24-bits wide.

Converting from RGB to palletized. This conversion is sophisticated and CPU-intensive. We could create an optimal palette at run time - a time-consuming task - or we could map colors to a supplied palette, which requires a look-up operation for each pixel of source data. Though this conversion is a slow operation, it can be optimized.


Copy mode. We simply transfer data from the source to the destination. Note that matching pixel depths don’t always mean that the formats are the same. Differences might exist in the order of the RGB components between formats. If so, we need to handle these differences appropriately.

RGB to RGB conversion. Let’s look at the most common conversion, true color (24-bit) to high color (16-bit), and a possible converter implementation. With true color, we can simply extract single RGB components into 8-bit values. These 8-bit values are good candidates for use as indices into an array. Using a logical OR of look-up tables for each RGBA component, we can save the shifting and masking of components during conversion. During the initialization of the converter, we’ll allocate four WORD tables, each containing 256 entries, and fill in masks of pure components representing all the possible source values from 0 to 255. For instance,

WORD r_bits[256], g_bits[256], b_bits[256], a_bits[256];

Each table only contains the given component, which ranges from minimum intensity (a value of 0) to maximum intensity (a value of 255). During conversion, we combine the components from the tables together, so converting a single pixel might look like this (assuming we’ve already extracted RGB components r, g, and b from the source pixel):

unsigned char r, g, b;
// extract r, g and b

dest_pixel = r_bits[ r ] |
g_bits[ g ] |
b_bits[ b ] |
a_bits[ 255 ];

Note that we set the alpha channel (if any) to maximum intensity (full opacity). True color to true color conversion requires only that we copy source RGBA components into appropriate positions in the destination pixel.

Palletized to RGB conversion. We can handle this conversion similarly to the way in which we handled the earlier RGB-to-RGB conversion. The only difference is in how we extract the RGB components from the source data. Each pixel is an index into a palette entry, so extraction looks like this:

unsigned char r = src_format->palette[src_pixel].red;

unsigned char g = src_format->palette[src_pixel].green;

unsigned char b= src_format->palette[src_pixel].blue;

RGB to palletized conversion. We can implement this conversion in one of three ways. First, we could provide a fixed palette (that of the primary surface, for example) during initialization and always map data to fit into this palette. In this case, we can precompute some values in the Init method. Sorting the palette entries by one component (red, for example) helps to speed up the look-up operation. After we’ve extracted the RGB components from each source pixel, we must find the best index into a given palette. To find this index in an unsorted palette, we’ll need to check all 256 entries to find the closest match. If we’ve already sorted the palette, then we simply find the closest red component (or whichever component by which the palette is sorted) and compute the sum of the deltas of all RGB components for this index. This delta is an error value. We then traverse the palette up and down to find a better match until the delta of the red component itself is less than or equal to the sum of the deltas of the best match.

A second method uses a uniform palette. We create a uniform 3-3-2 palette so that the destination format is similar to the RGB format, with a very narrow bit depth for each component. The conversion then becomes a standard RGB-to-RGB conversion.

The final method involves computing an optimal palette. As I’ve never used this method, I won’t go into it too deeply. Once we’ve created an optimal palette, the algorithm is the same as for the first solution.

Of these three algorithms, the second, the uniform palette, is the most effective. Unfortunately, the visual quality of that algorithm is also the worst. The upside is that this conversion method should only be considered as a fallback. For a game application, we’re better off converting RGB graphics to palletized mode using a professional graphics program, rather than performing an on-the-fly conversion with the graphics engine itself. Converting images on the fly gives us quick but ugly bitmaps. If this kind of conversion is what you really want, you may want to dither the bitmaps as well, as that slightly improves the quality of the effect.


Dithering is a way of distributing coloring errors during conversions that have dropped color information from the original image. Dithered images look much better than undithered ones, and the technique is suitable for textures as well. Of course, the price is additional CPU usage, as well as the time that it will take for us to rewrite certain parts of the converter to support dithering.

It’s suitable to dither when converting to 16- or 8-bit modes. In the converter we’re building, a single private method in the implementation class performs dithering from an 8-, 24-, or 32-bit source format to either an 8- or 16-bit destination format. This method simplifies the process. We could use any one of various dithering algorithms, but in this case we’ll use a simple error-diffusion algorithm. In this implementation, half of the error of a given pixel is distributed to the right neighbor of the given pixel, and half is distributed to the bottom neighbor of the given pixel. The error value is actually the difference between what color the pixel should be and what color the pixel really is. Thus, if we have an ideal color value in RGB space (8 bits per component) and a value after conversion (with cut off lower parts of components), then we compute the difference between the ideal and real component value (the error value) and distribute it. When processing the next pixel, we’ll add the error value from its left and upper neighbors to that pixel’s ideal value and use this sum as if it were the source value. (This approach assumes that we’re processing pixels in left-to-right and top-to-bottom order.)

The implementation may use an RGB error buffer with as many elements as the width of the converted image (for error distributed to the bottom) and a single RGB error value for error distributed to the right. We’ll need to clear the error buffer before converting the first line of the bitmap. The buffer contains distributed errors from the previous line for subsequent lines. We clear the right-distributed error at beginning of each line.

Let’s look at an example of dithering. Figure 1 shows the source image in true-color bitmap format.


The image in Figure 2 was converted without dithering. The image Figure 3 is the same image converted with dithering enabled. You can see the difference. The destination format is 565 high color.


Figure 4 and 5 show the same image converted to 8-bit RGB format (or palletized format with uniform palette). The difference is even greater.



Now that we’ve written most of the required parts of the converter, we’ll need to fine-tune it and improve its performance. Using a profiler, we’ll load a bitmap 100 times in a loop - the bottleneck at the conversion is obvious. For each pixel, we’re performing an array-access, rotation, and multiplication. These steps are easy to optimize using addition and pointer manipulation. Once we’ve identified these kinds of optimizations and completed them, the entire processing loop can be rewritten in assembly language. Listing 1 shows an example of the original source in C++ for converting palletized data into high-color data, and the same code after optimization:

By way of illustrating the efficiency that we gained by optimizing our converter, when loading a standard 3D scene, the total time spent in the Convert function is now less than five percent (this example converted palletized bitmaps into high-color texture format with dithering enabled).

Blending in Alpha Channels

Alpha channels come in handy mainly for 3D rendering. The alpha channel is actually an opacity value for each pixel of a bitmap. In 3D graphics, the alpha channel is stored with each pixel’s color information. On a disk, however, the situation is different. Very few graphics file formats are capable of storing an alpha channel in the same file as the RGB data. A common solution is to store alpha channels as separate bitmaps, using the lightness of pixels as an alpha channel into another RGB bitmap. 3D Studio Max and several other rendering systems support this approach.

Now that we’ve used our converter to convert RGB data, we can try assigning it to other tasks. Heck, it’s dealing with components’ bit masks, positions, and all that stuff, so why not put it to further good use? Let’s declare the following function in the class:

bool AlphaBlend(void *src_data, void *dst_data, size_t sx, size_t sy, size_t src_pitch, size_t dst_pitch, S_pixelformat *src_format, dword flags) const;

Now let’s assume that the destination buffer already contains properly converted RGB data in the same pixel format, for which the converter is currently initialized. The source data is an opacity bitmap thatwe want to mix into the alpha channel of the destination bitmap. Our implementation uses a different approach than the Convert method - the speed is less important here, so all possible conversions are done in one loop. For each pixel, we must extract a single RGB component from the opacity bitmap and compute the overall lightness of each pixel. The formula is

lightness = red * 0.3 + green * 0.6 + blue * 0.1,

based on the sensitivity of the human eye to single components of the spectrum. The computed lightness value is in the range from 0 to 255. Note that it isn’t necessary to use floating-point math for this computation; rather, we can use integer math:

lightness = ( (int)red * 77 + (int)green * 154 + blue * 25 ) / 256

where the compiler may replace the division by a shift. We might also decide to extract alpha information from only one component. In this phase, it’s trivial to add the option to invert the alpha channel by performing the following calculation:

lightness = 255 - lightness

We could specify this option with the function's input parameter flags. 3D Studio Max offers us the ability to use inverted alpha channels, so why not aim for maximum compatibility? After this step, we can add a few lines to enable dithering. Alpha channels are often narrow in depth; the common mode is 4444, where only 4 bits are reserved for an alpha channel. Dithering an alpha channel really improves image quality, especially if the RGB bitmap has very low contrast.

Our final step is to encode the alpha component into a pixel. The code now branches based on the destination pixel depth. For each case, we first need to mask off the previous alpha value and add a new alpha value, which must be shifted to its proper position and masked by an alpha bit mask of the current pixel format. That’s all there is to blending alpha-channels. It’s that simple! Figures 6 through 9 show an example of alpha mixing


You may wonder why I suggested setting the alpha value of the pixel during RGB conversion to a maximum value. If the loading or mixing of an opacity map fails for whatever reason, its better to see the RGB bitmap in full opacity, rather than in zero opacity (which you can’t see at all).

Color-Key Substitution

Color-keying is a technique wherein a certain color in a bitmap (or a range of colors) is assumed to be transparent and is not copied to the destination surface during blitting or rendering. Many developers use color-keying in 2D graphics for sprites, for example, as well as for textures in 3D, where one could render a fence merely by defining what pixels in a texture are transparent.

Most hardware supports color-keying in some form. But color-keying presents some problems - the edges of color-keyed textures are tricky. Most 3D cards use color-keying to blend textures to the edge of the transparent color. So, for example, the blended edges of a green leaf texture might look black or white (or whatever other color is supposed to be transparent). We can mitigate this effect by choosing a color-key that is similar to other colors contained in the texture. Achieving exactly the right transparent color is difficult due to fact that color information is sometimes lost during certain conversions. When color information is lost during conversion, some pixels may become identical, which can cause unexpected parts of texture to become transparent (Figure 10). A better approach is to replace color-keying by alpha channels. All decent 3D hardware offers texture formats with a 1-bit alpha channel (for example 1555 high-color mode). The edges of alpha-blended textures fade out nicely to translucency (Figure 11).

We can define transparent pixel information in a bitmap in the same way as with color-keying - a special color defines transparent parts of an image. All we need is a function that will scan an entire bitmap and set a pixel’s alpha value to translucent if the pixel equals the color-key value and to opaque if otherwise. We don’t need to limit this function to 1-bit alpha channels - we’re better off making it more flexible - so we’ll set the alpha value to its maximum intensity. We can declare the new method that we’re adding into our converter as

bool CKeyToAlpha(void *dest_data, size_t sx, size_t sy, size_t pitch, unsigned long color_key) const;

This function only operates on an RGBA surface that is already initialized in the same pixel format for which the converter is currently initialized. We need to scan every single pixel of the surface and compare it with the color_key parameter. Note that color_key must be in the surface’s pixel format. If the pixel matches the color-key, its alpha value is set to zero; otherwise it’s set to maximum. Because we’re only dealing with minimal and maximal alpha values, all we need to do is either clear or set all of the alpha bits in a pixel.


Generating MIP-maps is another task suitable for our converter. Though we could generate MIP-maps in one of several various ways, I’ll describe one simple method. This implementation is not particularly fast, however, because it uses conversions/memory transfers that could be avoided. Anyhow, using our existing conversion function, we can generate a MIP-map chain by writing only a few lines of code.

We need to filter down each MIP-map level to one-quarter of the size of the original bitmap. We average every neighboring 2x2 pixels of source bitmap; the result is a single pixel of a new MIP-map level. The most comfortable way to generate MIP-maps is to use the 32-bit pixel format containing ARGB values in 8-bit depth each. Our first task is to allocate memory in which to hold the original bitmap in the 32-bit format. Then we call the Convert function to convert the source bitmap from whatever format it may be to the 32-bit ARGB format.

We’ll need to add a function to filter the pixels. To avoid allocating more memory, we can put the results of our filtering operation into the same buffer that holds our source data. Listing 2 shows an example of such function. We can create MIP-maps up to the smallest possible bitmap resolution that current hardware supports. Most hardware supports MIP-map surfaces in which the smaller side is one pixel wide.

After filtering one level, we’ll need to convert the data back to the original bitmap format. We can use our converter’s Convert() function, which gives the memory buffer as the source bitmap and a Locked surface of a MIP-map level as a destination buffer.

It should be noted that combining MIP-map generation and color-keying results in aggregate effects - the color-keys in the bitmap shrink after each filtering. The smaller the MIP-map level, the less transparent it usually appears. Also, formats with a 1-bit alpha channel can produce unwanted artifacts. The average alpha value of four pixels may result in only two values, 0 or 1. If we’re going to have to generate MIP-map levels from color-keyed textures, we’d be better off choosing a texture format with a wider alpha channel.


The previous paragraph hints at some additional tasks that we could implement in our converter class. Let’s call all the other operations on a bitmap "filtering." We can define filtering as any operation over existing bitmap data (or part of it). Photoshop filters offer an easy-to-visualize example - adjusting the brightness level of image or its color saturation, converting bitmaps to black and white, or combine two bitmaps together. Another example of filtering is shrinking a bitmap to one-quarter its original size to generate the next MIP-map level, as described in the previous paragraph. We can create whatever filter we need, but more filters require more lines of code, and good performance requires even more code. For this article, however, let’s simply create a filter that converts textures to grayscale at run time. Furthermore, let’s define an expandable interface so that we can add more effects in the future, rather than writing a specialized function that only makes a bitmap black and white.

The filtering function takes arguments such as a pointer to bitmap memory, its resolution and pitch, the filter ID (a value identifying the requested filter), and additional data that depends on the kind of filter we’re creating (brightness level, for example, or a pointer to a palette). First, we need to initialize the converter to the pixel format of the bitmap in question.

The implementations of individual filters vary. For the grayscale filter, we need to extract RGB components from each pixel, average them, and write the result back in the proper pixel format. The principle is similar to those that apply to MIP-map generation: extract RGB components from the source pixel, perform the operation on them, and encode the modified components back to pixel’s format. Most filters (such as those manipulating brightness, grayscale, and so on) will have a single code path, traversing all pixels of the bitmap in a loop, extracting RGB components, performing a required operation, and encoding the data back.

For palletized modes, most filters can simply perform their operations on the palette rather than the pixels of the bitmap. We can consider the palette to be an array of values in RGB format. Thus, we simply assign the palette to be the buffer on which we want to do the operation. We assign the palette a width of 256 and a height of 1 and then jump to the branch where we deal with true-color pixel formats.

A Very Handy Tool

A proper pixel format converter is an essential part of any complex graphics library. Its interface can be defined very simply (six or eight public functions is enough) and, if written correctly, it quickly becomes a very handy tool that saves a lot of time and allows effects that would otherwise have to be hard-written again and again for each concrete situation. What if we were converting the output of .AVI playback from Video For Windows to a primary surface format in order to display it? I’d never dreamed of such a situation before I initially wrote the converter. But now, performing this conversion is as easy as setting the class to put the data from the 888 RGB .AVI buffer to whatever format the back buffer is in. And whichever format that is doesn’t matter to me at all….

Listing 1:

Original C++ source:

//unsigned size_x, size_y = resolution

//unsigned src_pitch, dst_pitch = source
// and destination resolution

//char *src = pointer to source

//short *dest = pointer to destination

//S_palette_color *palette = pointer to palette

//short *r_look_up, *g_look_up, *b_look_up,
// *a_look_up = pointers to look-up tables

for(; size_y--; src += src_pitch, dst += dst_pitch/sizeof(short)){


for(int x = 0; x < size_x; x++){

Const S_palette_color &pal_entry = palette[ src[x] ];
//logicaly OR 4 look-up tables to
// get destination pixel

dest[x] = r_look_up[ pal_entry.red ] |
g_look_up[ pal_entry.green ] |
b_look_up[ pal_entry.blue ] |
a_look_up[ 255 ];



Here’s the same code fragment rewritten in assembly and optimized for Pentium instruction pairing. For additional speed, all look-up tables are allocated in a contiguous memory block.


mov esi, src
mov edi, dest
push palette
push ebp
xor edx, edx


mov ecx, size_x
mov ebp, r_look_up


mov dl, [esi]
inc esi
lea ebx, [edx+edx*2]
add ebx, [esp+4]
mov ebx, [ebx]
mov dl, bl
and ebx, 0ffffffh
mov ax, [ebp+edx*2]
mov dl, bh
or ax, [ebp+7feh]
shr ebx, 16
or ax, [ebp+200h+edx*2]
or ax, [ebp+400h+ebx*2]
mov [edi], ax
add edi, 2
dec ecx
jnz lx

mov ebp, [esp]
sub edi, size_x
sub esi, size_x
sub edi, size_x
add esi, src_pitch
add edi, dst_pitch

dec size_y
jne ly

pop ebp
add esp, 4


Listing 2.

struct S_rgba{
//our custom 32-bit ARGB pixel format

unsigned char r, g, b, a;


void ShrinkMipmap(S_rgba *src, dword size_x, dword size_y){

S_rgba *dst=src,
//dst points to the current destination

//get pointer to the next line

size_x /= 2;
//halve the resolution of bitmap

size_y /= 2;



for(int count = size_x; count--; ++dst, src += 2, src1 += 2){

//for each component compute the average
// of 4 neighboring pixels

dst->r = (src[0].r + src[1].r + src1[0].r + src1[1].r) / 4;

dst->g = (src[0].g + src[1].g + src1[0].g + src1[1].g) / 4;

dst->b = (src[0].b + src[1].b + src1[0].b + src1[1].b) / 4;

dst->a = (src[0].a + src[1].a + src1[0].a + src1[1].a) / 4


//skip line

src += sx1*2;
src1 += sx1*2;



Michal Bacik is a lead programmer at Illusion Softworks (www.illusionsoftworks.com). Currently, he’s finishing programming work on a Hidden and Dangerous, a 3D real-time action/strategy game from the WWII era. He has been programming games since the days of the Commodore64. You can reach him at [email protected].

Read more about:


About the Author(s)

Michal Bacik


Michal is the lead programmer at Lonely Cat Games, a small company based in the Czech Republic.Previously he was lead programmer for Hidden and Dangerous, a World War II title released two years ago. Recently he finished work on H&D Deluxe, an improved version of the original game. Michal can be reached at [email protected].

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like