Energy: It’s Not Your Average Power

Energy. You read about it in the newspapers, hear about it from political pundits, and pay for it every month in gas, electric, and fuel bills. Airlines and automakers blame their financial woes on energy costs, and developing nations try to become energy-independent. There’s a lot of energy spent on, well, energy.

The same is true of electronic design. The energy consumption of a chip, system, or assembly is a big deal to many engineers. Handheld systems need to balance performance and features against current drain and battery life. At the opposite extreme, the designers of high-performance systems grapple with energy consumption because they need to reduce heat, minimize space, or meet Energy Star specifications. Yet, for all this concern, there’s no standard way to quantify the actual energy consumption of embedded devices.

Microprocessor vendors will generally provide power-consumption specifications on their data sheets, but these specs are difficult to compare with those from other vendors. Typically that’s because vendors use “typical” power numbers to characterize their processors, and only rarely do they indicate what typical work the chip was doing when they made their measurements. If your actual measurements don’t match theirs, well, then you must be atypical. Aren’t we all?

To settle this issue, EEMBC (the Embedded Microprocessor Benchmark Consortium) created a software utility that provides practical data on the amount of energy a processor consumes when it’s running a well-defined application workload. Called EnergyBench, this tool can be used in conjunction with EEMBC’s existing performance benchmarks to determine how power-efficiently various processors carry out a series of standardized, application-focused tasks. EnergyBench provides a standard yardstick for measuring energy consumption that’s directly tied to a standard set of performance tests. Now designers can compare the performance-versus-power ratio of processors from different vendors and select the one that best fits their performance needs and energy budget.

One of the most important things that EnergyBench teaches is that there’s no such thing as “typical power.” The average energy consumed while running EEMBC benchmarks varies widely depending on whether the chip is running the test kernels for digital entertainment, networking, or automotive applications. Instead of trying to arrive at an illusory across-the-board “typical power” number for a given device, EnergyBench measures typical energy consumption for a specific algorithm at a specific performance level.

How EnergyBench works

EEMBC’s programmers designed EnergyBench using the LabVIEW software and data-acquisition (DAQ) hardware, both from National Instruments. The DAQ hardware has multiple differential measurement channels, so we can measure multiple power rails simultaneously (both voltage and current), plus a trigger channel. EnergyBench uses the DAQ card to sample the voltage levels and provide a trigger to synchronize the performance benchmark with the power measurements. That way, we ensure that we’re measuring power consumption only during the timed portion of the benchmark code and not during the setup, preamble, initialization, or record-keeping phases.

Before a processor company can publish its EnergyBench scores, the scores must be certified by the EEMBC Technology Center. This prevents cheating, frankly, but it also ensures that results are reliable, repeatable, and consistent. EEMBC also specifies a range of test conditions that must be met as a condition of certification. For example, to achieve statistically accurate results, we require that samples be taken at 2x the Nyquist frequency or higher, or they can be taken at random points. EnergyBench accepts the sampling frequency as an input; the sampling code must then be called several times with different sampling frequencies. By sampling multiple times during the benchmark run using un-aliased frequencies, we avoid any unwanted “resonance” with the benchmark’s execution. This method is simple to implement and guarantees statistically accurate results.

Since we can easily repeat the process and/or increase the sampling frequency, EnergyBench collects as many samples as needed until the average energy consumption can be determined with statistical accuracy. For certification purposes, and to help device and tool designers, the process is repeated several times, and the standard deviation of the final result is calculated. This way, any outlying deviation is easily detectable since each run of each benchmark produces one number for the average energy per iteration of the benchmark.

Of course, the validity of the EnergyBench results assumes that the chip being tested is actually representative of the vendor’s product yield. EEMBC has always had strict rules against cherry-picking the devices submitted for certification.

On the other hand, semiconductor manufacturers are always trying to manage unwanted variations in their manufacturing processes, and one of the many potential applications for EnergyBench is to help them understand in more detail the specific components and effects of process variation as they relate to energy consumption.

Before testing, chips are allowed to warm up for at least 30 minutes to an ambient temperature of 70°F +/- 5°F. This helps to ensure consistent results. More to the point, energy consumption increases dramatically as devices get warmer, so we wanted to prohibit testing artificially chilled chips. Since the point of EnergyBench is to quantify typical energy usage, it seemed reasonable to mandate a “room temperature” environment. Incidentally, this rule also avoids the need for expensive equipment to control the temperature of the processor.

Integrating EnergyBench with performance benchmarks

To understand EnergyBench and why it’s so effective, it helps to understand a bit about the other EEMBC benchmarks. EEMBC benchmarks are written in ANSI C and are managed through a software “harness.” During benchmark execution, system calls aren’t made directly; they’re instead made through an abstraction layer in the harness. The harness is also responsible for initializing the system, preparing data for the benchmark if needed, and setting the number of iterations the benchmark will run. When the benchmarks are ported to a new processor or a new operating system, only the abstraction layer needs to change. Once that’s done, all the EEMBC benchmarks can be run without modification (assuming the processor has an ANSI C compiler).

To accommodate EnergyBench, the test harness invokes a new trigger mechanism in the abstraction layer to indicate when it’s entering or leaving the timed portion of the benchmark. As an example, some systems implement the trigger by sending a signal to a UART through an OS driver. One problem with the trigger is the delay from when the software issues the trigger indication until the DAQ hardware receives the signal. To accommodate this, the abstraction layer accepts a user-defined delay after the trigger is signaled but before benchmark starts to execute. To allow for a lag when the benchmark ends, the analysis module can limit itself to a specific number of iterations.

How EnergyBench samples, analyzes, and collects data

EnergyBench samples voltage and current at intervals while the benchmark is running. After it detects the start trigger, it stores samples on disk until it receives the end trigger. The benchmark code’s user interface, shown in Figure 1, and the inexpensive DAQ hardware reduce the potential for error and provide an affordable way for almost anyone to acquire the necessary data.

Figure 1. The EnergyBench sampling module can be configured via a friendly GUI or from a configuration file. All relevant parameters such as voltage levels, resistor values and sampling frequency can be configured. An optional scope-like graphical display of captured signals shows current, voltage, and trigger channels.

Any data-acquisition hardware that’s compatible with the DAQMx drivers from National Instruments can be used. The default DAQ card allows (and the EnergyBench specification requires) all of the processor’s power rails to be measured simultaneously. The EnergyBench software suite includes executables for simultaneously measuring one, two, or three rails.

For processors that need more than one supply voltage (i.e., separate core and I/O voltages), there are two ways to calculate the energy per benchmark iteration. First, the different supply voltages can all be measured simultaneously. This implies that all channels are sampled at the same rate, so the overall sampling rate of the DAQ card may need to be decreased to match the host machine’s ability to keep up with the flow of data. The second method relies on the test’s repeatability, and it allows the power inputs to be measured separately, with the sum of the average energies of each individual voltage rail equaling the total cumulative energy consumption.

The last piece of EnergyBench is the analysis of the captured data. The Power Analysis Module calculates the following values:

Minimum, maximum, average power, and standard deviation for each voltage rail
Geometric mean of total power
Minimum, maximum, average, and standard deviation of energy consumed per iteration of the benchmark

After the benchmark has run multiple iterations and all the measurement samples have been captured, the analysis module analyzes the data to find the crucial details. The EEMBC Power Analysis Module analyzes the captured samples, determines the average energy used per iteration of the benchmark, and looks for the minimum and maximum power samples. To calculate the energy, the geometric mean of the power samples is multiplied by the duration of the benchmark iteration. In some cases, it’s possible for a benchmark to iterate faster than the DAQ hardware can sample the power pins. In such cases, a minimum of 100 samples are analyzed before calculating the average energy for all iterations within that duration.

The results are displayed graphically by the Power Analysis Module in the energy/iteration chart, as shown in Figure 2. Users can also watch the display to examine minimum and maximum power spikes while the benchmark is running and see the variance in the captured samples.

Figure 2. Once all the samples have been captured, the analysis module calculates the energy per iteration of the benchmark. All of the parameters are fed in automatically using the EEMBC test harness.

Ultimately, the net result of EnergyBench is the average energy consumed per iteration of the workload. This is the EEMBC-certified Energymark™ score and is an optional metric that a processor manufacturer may (or may not) choose to disclose alongside their EEMBC-certified scores as a way to quantify a processor’s energy efficiency.

As a “sanity check” to ensure the reliability of Energymark scores, EEMBC’s Technology Center lab testers verify that the following conditions are true:

Variation within the specific sampling frequency falls within a confidence interval of 95%;
Reported energy consumption between both sampling frequencies does not produce unwanted aliasing;
Reported energy consumption from repeated invocations of the benchmark is stable and repeatable and doesn’t vary unusually widely.

If the variation within a specific sampling frequency is too large, the tester can increase the sampling frequency and/or the number of benchmark iterations until there are enough samples that the confidence interval of the mean value is within the specified 95% tolerance. If the variation between the two sampling frequencies is too large, the sampling frequencies can be changed.

If the variation between different invocations is more than a few percent, it’s probably because there was too much noise on the lines. Another possibility is that the processor is executing too many other tasks besides the benchmark code, in which case the benchmark execution needs to be better isolated. A schematic of this process is shown in Figure 3.

Figure 3. This process will tie typical energy with specific benchmark, and more than that – with specific workload of that benchmark.

The choices behind EnergyBench
To provide some insight on the methodology, EEMBC considered many alternatives; among them:

Specifying junction temperature for energy measurements
Using high-frequency oscilloscopes and a highly controlled environment
Specifying probes and calibration techniques

Since the goal was not to characterize chips but to define a standard way to derive typical energy consumption, we felt it was more important to define a method that was readily available and affordable, rather than one that was esoteric, exotic, and expensive. Instead of using expensive equipment and factory-like procedures, EnergyBench validation is done through statistical analysis and rules. Rather than specify junction or case temperature, we stipulate regulated room temperature. Rather than expensive analysis hardware, a simple data-acquisition system is used, and multiple runs at different frequencies guarantee reliability and repeatability.

Finally, we wanted a process that scaled from 5-MHz microcontrollers to the fastest processors in the market today – and tomorrow. The ability to replicate the process at multiple sites was also a concern, so that anyone can independently certify the results.

Sample results

Figure 4 shows a sample of the information that is publicly disclosed with certification. The sample shows that, depending on the processor platform and the benchmark being executed, even average power consumption can vary by as much as 8%. It also shows that the efficiency, as measured by energy consumed to accomplish a specific task, can be significantly different even when the average power consumption is similar (2.86e-2 joules for RGB to YIQ on the AMD platform vs. 1.06e-2 joules on the IBM platform, even though both show average power of around 2.3W).

Figure 4. Sample certification results for two benchmarks on two platforms.

It’s also interesting to look at energy consumption of platforms running at different clock frequencies and with different features enabled or disabled. Figure 5 shows energy results for a specific benchmark (basic floating-point manipulation) on the NXP 3180 running at several performance levels, with the cache and floating-point hardware both enabled and disabled. The chart clearly shows that running at 208 MHz and with floating-point hardware enabled is actually more energy-efficient than operating the device at 13 MHz.

Figure 5. Results on NXP 3180 with multiple configurations.

When you measure the energy instead of average power, and thus take into account the time required to execute the task, average power doesn’t necessarily translate to better energy efficiency.

Conclusion

EnergyBench provides several tools that can be used with readily available and affordable hardware to measure typical energy consumption. EnergyBench is the first such industry standard, although other organizations such as SPEC are also formulating strategies to address this need. Certified EnergyBench results are freely available at the EEMBC website (www.eembc.org).