feature article
Subscribe Now

Green Gates, Graphics & Google

Last week’s reveal of the ARM Cortex-A15 processor got me thinking: since when did adding gates reduce power? Doesn’t that violate some fundamental law of physics?

Then I started looking deeper, and it turns out that a lot of designers are adding logic to reduce power. It’s a counterintuitive approach that’s clearly gaining traction. And it illuminates the interesting tradeoffs we make in engineering today versus those we made just a few years ago.

In the case of ARM’s latest processor design, one of the many little tweaks it includes is a special “loop cache.” It’s not a real cache, first of all. More like a simple FIFO buffer. It’s just big enough to hold about 32 instructions, or about 128 bytes all told. No big deal, in other words.

Its purpose is to store a copy of your most recently encountered code loop. Specifically, it looks for a sequence of maybe 5–20 instructions that ends with a conditional backward branch. Your basic small loop, in other words. When the processor gets to the bottom of this loop and prepares to jump back to the top again, it bypasses the CPU’s normal instruction cache and instead grabs the instructions out of this little FIFO.

The result isn’t any faster than using the cache (which is already pretty darned quick), but it is more power-efficient. You see, FIFOs are dead-simple circuits whereas caches are comparatively complex. Powering-up the FIFO takes a whole lot less energy than powering the cache. If you already know the code you want is in both places, why not fetch it out of the simpler one? You get the same code and the same performance but save power. Not a bad little trick.

The weird part is that you’ve added more circuitry but saved power. And it clearly works, as evidenced by the number of other chip companies working the same seam. The underlying assumption here is that you won’t power-up both circuits at once, which would defeat the purpose. Instead, you build two more-or-less functionally identical circuits but use the simpler one when you can and the more complex one when you have to.

The other underlying assumption is that you’re saving enough dynamic current to make up for the added leakage current. All circuits leak when they’re turned off, but the amount depends largely on how your silicon is fabricated. In a high-speed, low-leakage semiconductor process you can get away with this. In low-cost bulk processes you might shoot yourself in the foot. Plenty of chips leak as much current in standby mode as they burn when they’re active. It’s all a matter of how you optimize.

Anyway, the ultimate example of this is a multicore processor. Most high-end graphics chips, DSPs, and microprocessors today have multiple CPU, GPU, or DSP cores, and they can usually shut these cores off on demand. Sure, you get great performance when all the cores are humming along together, but you get better power efficiency if you shut them down from time to time. We’re even starting to see chips with duplicate or redundant CPU or GPU cores precisely to get the “loop cache effect.” They’ll have one fully featured CPU along with one dumb-stepbrother version that takes over when the software isn’t too complex. The redundant CPU uses less power because it’s less complicated, while still being able to perform, oh, about 75% of its partner’s tasks.

Imagine sticking an entire 32-bit CPU on a chip just to save power. That’s like carrying a spare engine in the trunk of your car for short trips. On second thought, that’s exactly what gas/electric hybrid cars do now. And the tradeoffs are the same: less energy consumed but at the price of increased cost and complexity. After all, whether it’s a four-cylinder diesel or a 32-bit RISC, that second engine isn’t free. You’re paying for the hardware but saving on fuel.

Once again, the underlying assumption is that the “fuel” is more precious than the hardware consuming it. Hybrid cars are more expensive than their conventional counterparts, but they never, ever pay off in reduced fuel costs. But with silicon chips the price/efficiency equation actually does work. Adding gates to a chip costs very little, whereas reducing its power consumption may pay handsome dividends. That’s especially true at the very high and low ends of the power spectrum. Rack-mounted Web servers consume ungodly amounts of electricity, to the point where power and air-conditioning bills start to rival the cost of the computers themselves. At the other extreme, handheld devices need to eke out as much battery life as they can, because consumers don’t like recharging. At both extremes, throwing gates at the problem—even to the point of building in duplicate or triplicate processors—is a fair tradeoff.

That’s a far cry from where we were a decade ago. It used to be that hardware was expensive and power consumption was irrelevant. Heat was almost never an issue, because relatively few chips gave off enough heat to be a concern. And for those that did, we glued on a heat sink and called it good. Now the heat sinks are bigger than the processors and almost as expensive. Waste heat, like exhaust pipe emissions, is becoming the tail that wags the design dog. Maybe we’ll be designing gas/electric hybrid chips soon. 

Leave a Reply

featured blogs
Jun 2, 2023
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Jun 2, 2023
I just heard something that really gave me pause for thought -- the fact that everyone experiences two forms of death (given a choice, I'd rather not experience even one)....
Jun 2, 2023
Explore the importance of big data analytics in the semiconductor manufacturing process, as chip designers pull insights from throughout the silicon lifecycle. The post Demanding Chip Complexity and Manufacturing Requirements Call for Data Analytics appeared first on New Hor...

featured video

Synopsys 224G & 112G Ethernet PHY IP Demos at OFC 2023

Sponsored by Synopsys

Watch this video of the Synopsys 224G & 112G Ethernet PHY IP demonstrating excellent performance and IP successful ecosystem interoperability demonstrations at OIF.

Learn More

featured paper

EC Solver Tech Brief

Sponsored by Cadence Design Systems

The Cadence® Celsius™ EC Solver supports electronics system designers in managing the most challenging thermal/electronic cooling problems quickly and accurately. By utilizing a powerful computational engine and meshing technology, designers can model and analyze the fluid flow and heat transfer of even the most complex electronic system and ensure the electronic cooling system is reliable.

Click to read more

featured chalk talk

EiceDRIVER™ F3 Enhanced: Isolated Gate Driver with DESAT
Sponsored by Mouser Electronics and Infineon
When it comes to higher power applications, galvanically isolated gate drivers can be great solution for power modules and silicon carbide MOSFETS. In this episode of Chalk Talk, Amelia Dalton and Emanuel Eni from Infineon examine Infineon’s EiceDRIVER™ F3 Enhanced isolated gate driver family. They take a closer look at advantages of galvanic isolation and the key features and benefits that this gate driver family can bring to your next design.
Sep 12, 2022