feature article
Subscribe Now

Green Gates, Graphics & Google

Last week’s reveal of the ARM Cortex-A15 processor got me thinking: since when did adding gates reduce power? Doesn’t that violate some fundamental law of physics?

Then I started looking deeper, and it turns out that a lot of designers are adding logic to reduce power. It’s a counterintuitive approach that’s clearly gaining traction. And it illuminates the interesting tradeoffs we make in engineering today versus those we made just a few years ago.

In the case of ARM’s latest processor design, one of the many little tweaks it includes is a special “loop cache.” It’s not a real cache, first of all. More like a simple FIFO buffer. It’s just big enough to hold about 32 instructions, or about 128 bytes all told. No big deal, in other words.

Its purpose is to store a copy of your most recently encountered code loop. Specifically, it looks for a sequence of maybe 5–20 instructions that ends with a conditional backward branch. Your basic small loop, in other words. When the processor gets to the bottom of this loop and prepares to jump back to the top again, it bypasses the CPU’s normal instruction cache and instead grabs the instructions out of this little FIFO.

The result isn’t any faster than using the cache (which is already pretty darned quick), but it is more power-efficient. You see, FIFOs are dead-simple circuits whereas caches are comparatively complex. Powering-up the FIFO takes a whole lot less energy than powering the cache. If you already know the code you want is in both places, why not fetch it out of the simpler one? You get the same code and the same performance but save power. Not a bad little trick.

The weird part is that you’ve added more circuitry but saved power. And it clearly works, as evidenced by the number of other chip companies working the same seam. The underlying assumption here is that you won’t power-up both circuits at once, which would defeat the purpose. Instead, you build two more-or-less functionally identical circuits but use the simpler one when you can and the more complex one when you have to.

The other underlying assumption is that you’re saving enough dynamic current to make up for the added leakage current. All circuits leak when they’re turned off, but the amount depends largely on how your silicon is fabricated. In a high-speed, low-leakage semiconductor process you can get away with this. In low-cost bulk processes you might shoot yourself in the foot. Plenty of chips leak as much current in standby mode as they burn when they’re active. It’s all a matter of how you optimize.

Anyway, the ultimate example of this is a multicore processor. Most high-end graphics chips, DSPs, and microprocessors today have multiple CPU, GPU, or DSP cores, and they can usually shut these cores off on demand. Sure, you get great performance when all the cores are humming along together, but you get better power efficiency if you shut them down from time to time. We’re even starting to see chips with duplicate or redundant CPU or GPU cores precisely to get the “loop cache effect.” They’ll have one fully featured CPU along with one dumb-stepbrother version that takes over when the software isn’t too complex. The redundant CPU uses less power because it’s less complicated, while still being able to perform, oh, about 75% of its partner’s tasks.

Imagine sticking an entire 32-bit CPU on a chip just to save power. That’s like carrying a spare engine in the trunk of your car for short trips. On second thought, that’s exactly what gas/electric hybrid cars do now. And the tradeoffs are the same: less energy consumed but at the price of increased cost and complexity. After all, whether it’s a four-cylinder diesel or a 32-bit RISC, that second engine isn’t free. You’re paying for the hardware but saving on fuel.

Once again, the underlying assumption is that the “fuel” is more precious than the hardware consuming it. Hybrid cars are more expensive than their conventional counterparts, but they never, ever pay off in reduced fuel costs. But with silicon chips the price/efficiency equation actually does work. Adding gates to a chip costs very little, whereas reducing its power consumption may pay handsome dividends. That’s especially true at the very high and low ends of the power spectrum. Rack-mounted Web servers consume ungodly amounts of electricity, to the point where power and air-conditioning bills start to rival the cost of the computers themselves. At the other extreme, handheld devices need to eke out as much battery life as they can, because consumers don’t like recharging. At both extremes, throwing gates at the problem—even to the point of building in duplicate or triplicate processors—is a fair tradeoff.

That’s a far cry from where we were a decade ago. It used to be that hardware was expensive and power consumption was irrelevant. Heat was almost never an issue, because relatively few chips gave off enough heat to be a concern. And for those that did, we glued on a heat sink and called it good. Now the heat sinks are bigger than the processors and almost as expensive. Waste heat, like exhaust pipe emissions, is becoming the tail that wags the design dog. Maybe we’ll be designing gas/electric hybrid chips soon. 

Leave a Reply

featured blogs
Nov 24, 2020
In our last Knowledge Booster Blog , we introduced you to some tips and tricks for the optimal use of the Virtuoso ADE Product Suite . W e are now happy to present you with some further news from our... [[ Click on the title to access the full blog on the Cadence Community s...
Nov 23, 2020
It'€™s been a long time since I performed Karnaugh map minimizations by hand. As a result, on my first pass, I missed a couple of obvious optimizations....
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...
Nov 20, 2020
[From the last episode: We looked at neuromorphic machine learning, which is intended to act more like the brain does.] Our last topic to cover on learning (ML) is about training. We talked about supervised learning, which means we'€™re training a model based on a bunch of ...

Featured video

Synopsys and Intel Full System PCIe 5.0 Interoperability Success

Sponsored by Synopsys

This video demonstrates industry's first successful system-level PCI Express (PCIe) 5.0 interoperability between the Synopsys DesignWare Controller and PHY IP for PCIe 5.0 and Intel Xeon Scalable processor (codename Sapphire Rapids). The ecosystem can use the companies' proven solutions to accelerate development of their PCIe 5.0-based products in high-performance computing and AI applications.

More information about DesignWare IP Solutions for PCI Express

featured paper

Overcoming PPA and Productivity Challenges of New Age ICs with Mixed Placement Innovation

Sponsored by Cadence Design Systems

With the increase in the number of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan using manual methods, directly impacting tapeout schedules and power, performance, and area (PPA). In this white paper, learn how a breakthrough technology addresses design productivity along with design quality improvements for macro-dominated designs. Download white paper.

Click here to download the whitepaper

featured chalk talk

Minitek Microspace

Sponsored by Mouser Electronics and Amphenol ICC

With the incredible pace of automotive innovation these days, it’s important to choose the right connectors for the job. With everything from high-speed data to lighting, connectors have a huge impact on reliability, cost, and design. In this episode of Chalk Talk, Amelia Dalton chats with Glenn Heath from Amphenol ICC about the Minitek MicroSpace line of automotive- and industrial-grade connectors.

Click here for more information about Amphenol FCI Minitek MicroSpace™ Connector System