Green Gates, Graphics & Google

Last week’s reveal of the ARM Cortex-A15 processor got me thinking: since when did adding gates reduce power? Doesn’t that violate some fundamental law of physics?

Then I started looking deeper, and it turns out that a lot of designers are adding logic to reduce power. It’s a counterintuitive approach that’s clearly gaining traction. And it illuminates the interesting tradeoffs we make in engineering today versus those we made just a few years ago.

In the case of ARM’s latest processor design, one of the many little tweaks it includes is a special “loop cache.” It’s not a real cache, first of all. More like a simple FIFO buffer. It’s just big enough to hold about 32 instructions, or about 128 bytes all told. No big deal, in other words.

Its purpose is to store a copy of your most recently encountered code loop. Specifically, it looks for a sequence of maybe 5–20 instructions that ends with a conditional backward branch. Your basic small loop, in other words. When the processor gets to the bottom of this loop and prepares to jump back to the top again, it bypasses the CPU’s normal instruction cache and instead grabs the instructions out of this little FIFO.

The result isn’t any faster than using the cache (which is already pretty darned quick), but it is more power-efficient. You see, FIFOs are dead-simple circuits whereas caches are comparatively complex. Powering-up the FIFO takes a whole lot less energy than powering the cache. If you already know the code you want is in both places, why not fetch it out of the simpler one? You get the same code and the same performance but save power. Not a bad little trick.

The weird part is that you’ve added more circuitry but saved power. And it clearly works, as evidenced by the number of other chip companies working the same seam. The underlying assumption here is that you won’t power-up both circuits at once, which would defeat the purpose. Instead, you build two more-or-less functionally identical circuits but use the simpler one when you can and the more complex one when you have to.

The other underlying assumption is that you’re saving enough dynamic current to make up for the added leakage current. All circuits leak when they’re turned off, but the amount depends largely on how your silicon is fabricated. In a high-speed, low-leakage semiconductor process you can get away with this. In low-cost bulk processes you might shoot yourself in the foot. Plenty of chips leak as much current in standby mode as they burn when they’re active. It’s all a matter of how you optimize.

Anyway, the ultimate example of this is a multicore processor. Most high-end graphics chips, DSPs, and microprocessors today have multiple CPU, GPU, or DSP cores, and they can usually shut these cores off on demand. Sure, you get great performance when all the cores are humming along together, but you get better power efficiency if you shut them down from time to time. We’re even starting to see chips with duplicate or redundant CPU or GPU cores precisely to get the “loop cache effect.” They’ll have one fully featured CPU along with one dumb-stepbrother version that takes over when the software isn’t too complex. The redundant CPU uses less power because it’s less complicated, while still being able to perform, oh, about 75% of its partner’s tasks.

Imagine sticking an entire 32-bit CPU on a chip just to save power. That’s like carrying a spare engine in the trunk of your car for short trips. On second thought, that’s exactly what gas/electric hybrid cars do now. And the tradeoffs are the same: less energy consumed but at the price of increased cost and complexity. After all, whether it’s a four-cylinder diesel or a 32-bit RISC, that second engine isn’t free. You’re paying for the hardware but saving on fuel.

Once again, the underlying assumption is that the “fuel” is more precious than the hardware consuming it. Hybrid cars are more expensive than their conventional counterparts, but they never, ever pay off in reduced fuel costs. But with silicon chips the price/efficiency equation actually does work. Adding gates to a chip costs very little, whereas reducing its power consumption may pay handsome dividends. That’s especially true at the very high and low ends of the power spectrum. Rack-mounted Web servers consume ungodly amounts of electricity, to the point where power and air-conditioning bills start to rival the cost of the computers themselves. At the other extreme, handheld devices need to eke out as much battery life as they can, because consumers don’t like recharging. At both extremes, throwing gates at the problem—even to the point of building in duplicate or triplicate processors—is a fair tradeoff.

That’s a far cry from where we were a decade ago. It used to be that hardware was expensive and power consumption was irrelevant. Heat was almost never an issue, because relatively few chips gave off enough heat to be a concern. And for those that did, we glued on a heat sink and called it good. Now the heat sinks are bigger than the processors and almost as expensive. Waste heat, like exhaust pipe emissions, is becoming the tail that wags the design dog. Maybe we’ll be designing gas/electric hybrid chips soon.