feature article
Subscribe Now

Green Gates, Graphics & Google

Last week’s reveal of the ARM Cortex-A15 processor got me thinking: since when did adding gates reduce power? Doesn’t that violate some fundamental law of physics?

Then I started looking deeper, and it turns out that a lot of designers are adding logic to reduce power. It’s a counterintuitive approach that’s clearly gaining traction. And it illuminates the interesting tradeoffs we make in engineering today versus those we made just a few years ago.

In the case of ARM’s latest processor design, one of the many little tweaks it includes is a special “loop cache.” It’s not a real cache, first of all. More like a simple FIFO buffer. It’s just big enough to hold about 32 instructions, or about 128 bytes all told. No big deal, in other words.

Its purpose is to store a copy of your most recently encountered code loop. Specifically, it looks for a sequence of maybe 5–20 instructions that ends with a conditional backward branch. Your basic small loop, in other words. When the processor gets to the bottom of this loop and prepares to jump back to the top again, it bypasses the CPU’s normal instruction cache and instead grabs the instructions out of this little FIFO.

The result isn’t any faster than using the cache (which is already pretty darned quick), but it is more power-efficient. You see, FIFOs are dead-simple circuits whereas caches are comparatively complex. Powering-up the FIFO takes a whole lot less energy than powering the cache. If you already know the code you want is in both places, why not fetch it out of the simpler one? You get the same code and the same performance but save power. Not a bad little trick.

The weird part is that you’ve added more circuitry but saved power. And it clearly works, as evidenced by the number of other chip companies working the same seam. The underlying assumption here is that you won’t power-up both circuits at once, which would defeat the purpose. Instead, you build two more-or-less functionally identical circuits but use the simpler one when you can and the more complex one when you have to.

The other underlying assumption is that you’re saving enough dynamic current to make up for the added leakage current. All circuits leak when they’re turned off, but the amount depends largely on how your silicon is fabricated. In a high-speed, low-leakage semiconductor process you can get away with this. In low-cost bulk processes you might shoot yourself in the foot. Plenty of chips leak as much current in standby mode as they burn when they’re active. It’s all a matter of how you optimize.

Anyway, the ultimate example of this is a multicore processor. Most high-end graphics chips, DSPs, and microprocessors today have multiple CPU, GPU, or DSP cores, and they can usually shut these cores off on demand. Sure, you get great performance when all the cores are humming along together, but you get better power efficiency if you shut them down from time to time. We’re even starting to see chips with duplicate or redundant CPU or GPU cores precisely to get the “loop cache effect.” They’ll have one fully featured CPU along with one dumb-stepbrother version that takes over when the software isn’t too complex. The redundant CPU uses less power because it’s less complicated, while still being able to perform, oh, about 75% of its partner’s tasks.

Imagine sticking an entire 32-bit CPU on a chip just to save power. That’s like carrying a spare engine in the trunk of your car for short trips. On second thought, that’s exactly what gas/electric hybrid cars do now. And the tradeoffs are the same: less energy consumed but at the price of increased cost and complexity. After all, whether it’s a four-cylinder diesel or a 32-bit RISC, that second engine isn’t free. You’re paying for the hardware but saving on fuel.

Once again, the underlying assumption is that the “fuel” is more precious than the hardware consuming it. Hybrid cars are more expensive than their conventional counterparts, but they never, ever pay off in reduced fuel costs. But with silicon chips the price/efficiency equation actually does work. Adding gates to a chip costs very little, whereas reducing its power consumption may pay handsome dividends. That’s especially true at the very high and low ends of the power spectrum. Rack-mounted Web servers consume ungodly amounts of electricity, to the point where power and air-conditioning bills start to rival the cost of the computers themselves. At the other extreme, handheld devices need to eke out as much battery life as they can, because consumers don’t like recharging. At both extremes, throwing gates at the problem—even to the point of building in duplicate or triplicate processors—is a fair tradeoff.

That’s a far cry from where we were a decade ago. It used to be that hardware was expensive and power consumption was irrelevant. Heat was almost never an issue, because relatively few chips gave off enough heat to be a concern. And for those that did, we glued on a heat sink and called it good. Now the heat sinks are bigger than the processors and almost as expensive. Waste heat, like exhaust pipe emissions, is becoming the tail that wags the design dog. Maybe we’ll be designing gas/electric hybrid chips soon. 

Leave a Reply

featured blogs
Jul 29, 2021
Circuit checks enable you to analyze typical design problems, such as high impedance nodes, leakage paths between power supplies, timing errors, power issues, connectivity problems, or extreme rise... [[ Click on the title to access the full blog on the Cadence Community sit...
Jul 29, 2021
Learn why SoC emulation is the next frontier for power system optimization, helping chip designers shift power verification left in the SoC design flow. The post Why Wait Days for Results? The Next Frontier for Power Verification appeared first on From Silicon To Software....
Jul 28, 2021
Here's a sticky problem. What if the entire Earth was instantaneously replaced with an equal volume of closely packed, but uncompressed blueberries?...
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

Accelerate Intelligent SLAM with DesignWare ARC EV Processor IP

Sponsored by Synopsys

Simultaneous localization and mapping (SLAM) algorithms build a map and determine location in the map at the same time. But how can you speed up the results? This demo shows how ARC EV processor IP with CNN engine accelerates KudanSLAM algorithms.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

Carmakers charge ahead with electric vehicle powertrain integration

Sponsored by Texas Instruments

Advancements to electric vehicle (EV) powertrain architectures help customers cut system-design costs in half while maximizing power density, increasing efficiency, improving reliability, and making EVs more affordable for more people.

Click to read more

featured chalk talk

Using the Graphical PMSM FOC Component in Harmony3

Sponsored by Mouser Electronics and Microchip

Developing embedded software, and particularly configuring your embedded system can be a major pain for development engineers. Getting all the drivers, middleware, and libraries you need set up and in the right place and working is a constant source of frustration. In this episode of Chak Talk, Amelia Dalton chats with Brett Novak of Microchip about Microchip’s MPLAB Harmony 3, with the MPLAB Harmony Configurator - an embedded development framework with a drag-and-drop GUI that makes configuration a snap.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment (IDE)