Video Processing Image Processing Audio Processing High Definition 3D Graphics Low Power Digital Rights Management Infinite Compute Possibilities
Products

Request More Info
Contact Us To request more info or to register for updates;
> contact us

Technology
What is the 3DLABS Media Processor?
The 3DLABS Media Processor is a highly parallel, fully programmable floating point array combined with dual ARM® processor core and integrated peripheral support to provide a fully balanced solution for accelerating both application and media intensive tasks with maximum MIPS per mW.

What is special about it?
The Media Processor has an unusually flexible, programmable media-processing engine. All processing is done with this engine and the ARM® CPUs; there are no special units dedicated to specific functions. This architecture allows us to squeeze the most performance from the lowest cost and the lowest power consumption.

How does it do that?

A fixed function unit can only do one thing. A 3D graphics unit can only do 3D graphics. If you don’t need 3D graphics that part of the chip sits idle. You are paying for it even though you don’t need it. If you don’t want 3D graphics from the Media Processor you simply don’t load the program. You can use all the chip for video decode if that’s what you want.

When a chip is designed with fixed function units someone has to decide how much of the chip to devote to it, and that limits its performance. For example, a chip designed for both video decode and 3D graphics has to assign some of the die to each function. The video decode performance will depend on how much area is given to it, so for a given cost, adding 3D graphics reduces the video performance. We don’t have that problem with our Media Processor; when you run video decode you get the whole chip, then when you run 3D graphics you get the whole chip.

What if I want to run video and 3D at the same time?

The engine is multi-tasking, just as the CPUs are, so video decode and 3D graphics (and audio, 2D graphics, physics engines and anything else you can think of) can be time-sliced on the engine. In that regard, we’ve designed the engine to be more like a CPU than a traditional graphics chip or DSP.

So is the engine just a very fast CPU?
No, it’s very different. We all know the power consumption of general purpose CPUs doing media tasks. Our engine is built from clusters of parallel units. The Media Processor has three clusters; future chips will have different numbers. Each cluster can work on a different task, and each cluster can multi-task. This gives us great flexibility in how work is distributed and lets us tackle very different types of problems. Multi-tasking is a great way to distribute a resource but can cause problems in real time systems. In those circumstances we can lock a job to a cluster while the other jobs multi-task on the remaining clusters.

Isn’t that a challenge to program?
No, we provide different ways to use the engine. At its simplest the programmer just writes a normal program to run on an ARM® CPU. The program calls libraries that we provide that control processing on the media engine. The programmer doesn’t even know the engine is there. Or we allow a programmer to take full control and hand code highly optimised routines for the media engine and decide which clusters run which program. In between these extremes, programmers can mix their code with ours, queue tasks to be executed and have the workload automatically distributed.

What is in each cluster?
A cluster is group of single instruction multiple data (SIMD) processing elements. SIMD is a well-known method for packing a lot of processing into a small space. The trick is to make that processing useful. There have been far too many SIMD designs that could do one thing very well but failed to deliver across a range of applications. Our design takes the efficiency of SIMD and combines it with new ideas to produce something uniquely flexible.

That’s a big claim, can you back it up?
Well, with the same engine we can decode high definition H.264 video, then process an FFT, then run Bayer interpolation on data from an image sensor and do it all with world-class efficiency. If you understand those algorithms you’ll know that they have very different compute loads and data access patterns.

What performance does the Media Processor have?
A summary of performance is given > here

Why do you support floating point?
Many engineers used to traditional DSPs think of floating point as something for bloated applications on power hungry CPUs. In fact, floating point can be quicker than integer because you don’t have to worry about overflow and renormalising. You write less code, it’s easier to understand, and quicker to get working. But that’s only true if the hardware is done well and we have a long history of floating point design. For our media processor we put a lot of effort into designing a really tight, efficient unit that runs floating point as quickly as integer. There’s no drop in performance for floating point so it can be used wherever it’s needed. And we use it all over the place. Not only in obvious places like 3D graphics but also image processing, physics simulations, audio processing, and general signal processing. We support IEEE single precision floating point, and a special 16-bit format that’s great for high dynamic range image processing.

Surely something like a colour processing pipeline for an image sensor is best done as a fixed function?
Show me two image processing engineers who agree on a pipeline. Everyone has different needs and if we hardcode the algorithms we please no one. Take Bayer interpolation; there is continuous research into the best way to extract data from a sensor. Even if the research stopped today, different algorithms are best for different circumstances. The algorithm you want to use when displaying a live preview of the scene should be most concerned with using minimum power. The algorithm you use when capturing an image to save should focus on quality. We just run a different program and get the best of both worlds.

Why do you have two ARM® 9 CPUs? Why not one ARM® 11?
One obvious benefit is that two ARM® 9 processors give more MIPs for fewer milliWatts than a single ARM® 11, but this is not the only benefit.

 

The twin CPUs allow separation of interrupt driven events and applications. In real time systems there is a conflict between the application that you want to run and all the events such as network traffic and disk activity that have to be handled on an interrupt basis. The randomness of the background tasks makes it difficult to guarantee the speed of the primary application so some headroom has to be left in the system.

Because we have two CPUs we can use one to handle the interrupt related tasks and one to run the real time application. When our customers ship a system and say it will run video at a particular rate, they can be sure it will keep doing that even if another event arrives part way through the movie. The system is more stable, tuning is easier, our customers have shorter time to market, and we don’t have to run the CPU faster than necessary so power consumption is lower.

What techniques have you used to reduce power consumption?
Energy efficiency starts with the fundamental architecture. How does data move through the system? How can it be kept on chip? How do you control power when the system is busy and when it’s idle? Our architecture has been shaped by the way we tackled these questions, and we didn’t only look at what is happening today but what we need to do for future silicon processes as their characteristics change.

One of the most important things we have done is to provide multiple voltage domains that allow us to set different voltages as demand for performance changes. We can do this dynamically without rebooting. We can also completely power off parts of the chip while the rest keeps running.

Technology
     
© 3DLABS Inc. Ltd. All Rights Reserved
Site Feedback | Privacy Policy | Terms of Use |