Sponsored By

In this Intel-sponsored feature, part of the <a href="http://www.gamasutra.com/visualcomputing">Gamasutra Visual Computing microsite</a>, Lightspeed Publishing's Lee Purcell lays out deferred mode image processing, which speeds up complex image-processing tasks with up to 3X performance increases.

September 24, 2008

5 Min Read

Author: by Lee Purcell

[In this Intel-sponsored feature, part of the Gamasutra Visual Computing microsite, Lightspeed Publishing's Lee Purcell lays out deferred mode image processing, a new addition to the Intel Integrated Performance Primitives Library, which speeds up complex image-processing tasks with up to 3X performance increases.]

Wherever you look, the graphical resolution of commonly used digital image formats is steadily increasing, resulting in larger file sizes and more intensive processing requirements. In several fields of image processing-digital photography, high-definition digital moviemaking, medical diagnostics, surveillance imaging, and others-frame sizes are increasing substantially.

In the case of digital video formats, such as Cinema 2K and 4K, the color space is also being expanded, further increasing the file sizes. File sizes for Cinema 4K content can be as much as one terabyte per hour of video. On the other end of the scale, even mobile handheld devices routinely capture images that can be several megapixels in size. With image sizes of this magnitude, fresh approaches are needed to maintain performance when manipulating and processing image data.

In response to a requirement from a strategic Intel customer involved in large-scale computer tomography images, Intel software engineers began conceptualizing a framework for more efficiently using the extensive library of image-processing algorithms available in Intel Integrated Performance Primitives (Intel IPP) library.

The resulting solution, which is featured in the Intel IPPP version 6.0 release, is called deferred mode image processing (DMIP). DMIP effectively handles large image data arrays that don't fit entirely within the processor L2 cache. 

DMIP, now an integral part of the Intel IPP package, performs pipelined sequences of fast functions to process image data in manageable portions, whether organized by tile, block, slice, or another element. This approach effectively combines the benefits of pipelined processing with manually optimized code of the Intel IP library.

A directed acyclic graph (DAG) defines inputs (from image data sources), outputs (destination images or data destined for memory), and operations and represents each as nodes on the graph. These nodes correspond to image-processing functions and their inputs and outputs. For operations that can be handled concurrently, parallel threads are generated to enhance performance. Using DMIP can accelerate image-processing tasks between one and a half to three times compared to a non-pipelined approach.

image001.gif

One key benefit: DMIP provides formula-level access to the vast library of Intel IP functions. Within the Intel IP version 6.0 release, developers can choose from among thousands of C functions that encompass a large span of data operations.

Those not familiar with full range of options in the Intel IP library can sometimes be discouraged from employing the algorithms in their applications. By removing the need to focus on the details of low-level programming, DMIP simplifies access to library functions, letting developers integrate advanced, proven routines into their code and take advantage of data alignment performance gains, particularly on Intel processors. These gains typically result in significantly faster instruction processing times for aligned data, commonly achieving speed increases of two to three times.

Alexander Kibkalo, an Intel software engineer who worked on the development of DMIP, offered insights into the optimization advantages: "Fundamentally, DMIP optimizes overall image-processing tasks within an application while individual Intel IP functions can be optimized without requiring knowledge of the environment or the conditions of the function call. DMIP provides the detailed descriptions of each task in DAG form and the appropriate preferences can then be applied for optimizing the routines."

"In terms of parallelization on Intel processors," Kibkalo continued, "DMIP tries to maintain a balance of the slice size. Keeping it comparatively small lets the slice fit into L2 cache and enable efficient pipelining. Splitting the slices into comparatively large sub-slices allows them to be efficiently handled by the available processor cores (for example, by 4 lines for a quad-core processor or by 8 lines for an 8-core processor). For achieving the best performance, the actual method of splitting should be tailored to the individual processor on which the application is being run."

DMIP was designed for flexibility, so additional capabilities and extensions can be added for custom operations or to address specific customer requirements. DMIP excels at handling a repeated series of calculations that are performed on a number of images of the same frame size. Thus, it has potential applications in areas mentioned elsewhere in this issue of Visual Computing Insight. DMIP's capabilities are a good match for many of the image-processing operations that are part of the digital workflow solution that Silicon Imaging, IRIDAS, CineForm, and Wafian offer. 

Image processing promises to continue growing in complexity as images of greater and greater resolutions proliferate. To those in the developer community looking for a better way to handle manipulation of large images, the addition of DMIP to the Intel IP libraries will streamline many common image-processing sequences.

Other niceties bundled with the Intel IP version 6.0 release include support for the Intel Atom processor, optimized Linux operating-system libraries, additional performance-tuned data-compression libraries, new static libraries of improved threading optimization, and cryptographic algorithms. Intel IP functions go beyond the performance-boosting capabilities of optimized compilers alone by taking advantage of available processor features and optimized instruction sets.

For example, matching the Intel IP function algorithms to the low-level optimizations for the Intel Streaming SIMD Extensions (Intel SSE) (from Intel SSE through Intel SSE4) can improve overall application performance substantially. Software developers and their application designs benefit from well-established, highly refined algorithms that address many of the most important programming operations. Intel's support network also adds value and utility to these libraries.

Read more about:

Features
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like