feature article
Subscribe Now

Small Change, Big Difference

How Manufacturing Variations Affect Your Power Consumption

They say football is a game of inches, and League of Legends is a game of millimeters. Well, making microprocessor chips is a game of atoms.

Most of us aren’t very good at statistics, but we do have an intuitive kind of understanding about the law of averages and the law of large numbers. If you have a big truckload of oranges, losing one or two oranges doesn’t make much difference. But if you have only three oranges in your hand, losing one (or gaining one) is a big deal. Obvious, right?

Semiconductor fabrication works in a similar vein. If your chip was built in the 1980s using 1.0-micron technology, a few extra atoms here or there didn’t make a lot of difference to its size, weight, performance, reliability, or power consumption. A bit of stray copper or polysilicon wouldn’t have affected the device’s operation. Indeed, back then it probably wouldn’t have been measurable or even detectable. Ignorance was silicon bliss.

Now our chips are made to astonishing tolerances. Atoms count. And because there are fewer of them – because the tolerances are so small – a few extra atoms one way or the other can make a significant difference to a chip’s behavior. We already know that tiny contaminants or imperfections can render a chip useless. But even tinier changes can alter your system’s power consumption by double-digit percentages. And there’s nothing you can do about it.

Quite a number of people are starting to notice, and measure, the difference. A study done at the University of California at San Diego (Go, Tritons!) took a set of six presumably identical Intel Core i5-540M processors, hooked them up to the same power supply and peripheral components, ran the same benchmark tests, measured them with the same instruments, and found… not the same power consumption. In fact, power usage varied by 12% to 17%. To be clear, the power consumption varied by at least 12% — not less. They weren’t able to get six apparently identical chips to produce anything like identical power numbers.

To factor out the possibility that some other hardware might somehow be contributing to the observed variations, the group used two different motherboards, but reported no difference in their readings. The power variations followed the processor, not the support logic.

Furthermore, chips that ran “hot” on one test did so across all the tests, clearly fingering the silicon and not the software as the culprit. Using the 19 separate benchmark tests within SPEC2006 as their test suite, the team saw that processor #2 (for instance) consistently used more power than processor #3.

Not surprisingly, slowing down the processors reduced the absolute variation in power consumption, though not their relative relationships. Speeding the chips up increased power differences, sometimes dramatically. One processor on one test sucked 20% more power than its peers. All for identical chips.

Lest we think that this effect is limited to Intel processors, a team at UCLA (Go, Bruins!) found that ARM-based microcontrollers do essentially the same thing. Their study focused on the opposite end of the spectrum, where the test subjects were in sleep mode. They started out by testing ten (theoretically) identical Atmel SAM3U MCUs while running in their active state and found power variations of less than 10%. This seemed unremarkable and is mentioned almost in passing, because the purpose of the study was to monitor sleep-mode power. In reality, a 5–10% difference in power consumption among identical devices running identical code would freak out most developers and have them questioning their instruments. But let’s move on.

In sleep mode, each chip behaved very differently, indeed. Some consumed 3x to more than 5x more power than their peers under the same conditions. The team then varied the temperature of each device and, as expected, power consumption scaled with temperature, but not in the same way for each chip. Oddly, the individual SAM3U devices didn’t all maintain their rankings as high- or low-power leaders. While they all consumed more sleep-mode power as the temperature rose, some increased more rapidly than others, sometimes overtaking their siblings. A low-power leader when cool (22°C/72°F, in the testing) might finish up mid-pack at 60°C. One outlier was nearly off the charts at room temperature (145 µW versus about 40 µW for all of its rivals), but then it changed hardly at all as the mercury rose. Some chips exhibited hockey-stick curves while others followed a more-gentle slope. Go figure.

The UCLA researchers make the not-unreasonable assumption that an MCU-based system will likely spend most of its life in sleep mode, possibly waking periodically to sample a sensor and process the results before hibernating again. In such cases, the chip’s sleep-mode power consumption, and the way it varies with temperature, is vitally important. But based on just the ten samples they examined, the duty cycle of such a system might have to change by as much as 50% to get the battery life you want. Conversely, your battery life might change drastically based on the individual chip you happen to get in the mail. And imagine trying to engineer a reliable way to accommodate those random variations.

It gets weirder. A completely unrelated group working in the Czech Republic found that benchmark performance can vary considerably because of… magic, as far as they can tell. They ran the same FFT benchmark over and over on an x86 machine running Fedora, and they got the same score every time, as you would expect. Then they rebooted the machine and ran the test a few thousand more times – and got different numbers. The results in the second test run were all the same – just different from the first test run. Rebooting a third time produced a third set of numbers, again all self-consistent. Another few thousand test runs, and another cluster of similar scores. Strange. Within each test run, the scores were nearly identical, but between reboots, they varied by as much as 4% from previous runs.

Granted, four percent isn’t a lot, but what’s remarkable is that the scores changed at all. Two thousand consecutive test runs all produce identical scores, then a simple reboot changes the next two thousand scores, all by exactly the same amount and in the same direction, for no apparent reason. Even when you think you’re running a controlled experiment, you’re not really in control.

Imagine the consternation when you say, “Hey, boss, I ran that benchmark you asked for and got the same result a thousand separate times. Those numbers are rock-solid. Go ahead and publish them.” Then some n00b runs the same binaries on the same hardware on the same day and gets a significantly different number. Thousands of times. Also completely repeatable. Who’s correct?

If you need any worse news, the Czech team also found that compiling the same source code with the same gcc compiler and the same compile-time switches – in short, precisely the same circumstances – produced different binaries that varied in performance by as much as 13%. That’s with no changes whatsoever from the programmer, the tools, or the development system. They pegged the variation to nondeterminism in the gnu tools. Good luck with that.

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

Introduction to the fundamental technologies of power density

Sponsored by Texas Instruments

The need for power density is clear, but what are the critical components that enable higher power density? In this overview video, we will provide a deeper understanding of the fundamental principles of high-power-density designs, and demonstrate how partnering with TI, and our advanced technological capabilities can help improve your efforts to achieve those high-power-density figures.

featured paper

Streamlining functional safety certification in automotive and industrial

Sponsored by Texas Instruments

Functional safety design takes rigor, documentation and time to get it right. Whether you’re designing for the factory floor or cars on the highway, this white paper explains how TI is making it easier for you to find and use its integrated circuits (ICs) in your functional safety designs.

Click here to download the whitepaper

Featured Chalk Talk

Rail Data Connectivity

Sponsored by Mouser Electronics and TE Connectivity

The rail industry is undergoing a technological revolution right now, and Ethernet connectivity is at the heart of it. But, finding the right interconnect solutions for high-reliability applications such as rail isn’t easy. In this episode of Chalk Talk, Amelia Dalton chats with Egbert Stellinga from TE Connectivity about TE’s portfolio of interconnect solutions for rail and other reliability-critical applications.

Click here for more information about TE Connectivity EN50155 Managed Ethernet Switches