feature article
Subscribe Now

EEMBC Benchmarks Correlate Power with Performance

“Just because your voice reaches halfway around the world doesn’t mean you are wiser than when it reached only to the end of the bar.” — Edward R. Murrow

To twist an old cliché, there are three kinds of lies: lies, damned lies, and benchmarks. EEMBC aims to improve all three.

For more than 20 years, EEMBC has been in the unenviable business of creating, testing, and distributing benchmarks for embedded devices. (The name once stood for EDN Embedded Benchmark Consortium, but now it’s just an unattributed acronym, like MIPS.) They’ve got benchmarks to measure performance, benchmarks for power consumption, benchmarks for security – you name it. EEMBC’s various benchmarks have become the de facto standard yardstick for devices that aren’t computers. That’s largely by default, since there aren’t many other benchmarks around that aren’t tuned for narrow applications or specific runtime environments. Java benchmarks are plentiful; benchmarks for measuring an MCU’s power consumption while sending Wi-Fi packets or doing real-time motor control, not so much.

But with success comes responsibility, as well as certain challenges. EEMBC’s ULPmark (the name is a mashup of ultra-low-power and benchmark) has been around for years, and it’s widely used to “prove” that one maker’s MCUs are more power efficient than their competitors’ chips. We’ll pause here while the alarm bells going off in your head have a chance to subside.

Still with us? Grand. Measuring CPU performance is hard enough, even among processors with similar architectures. Just look at any Intel-versus-AMD showdown. But that’s a cakewalk compared to measuring a chip’s power consumption. Measured doing what? Running at full speed with all peripherals active? Napping in low-power mode? Loping along at twenty furlongs per fortnight? Embedded systems and MCUs have lots of peripherals combined with lots of power-saving modes that can switch on and off on short notice. How do you create a level playing field, even among chips from the same maker or the same architectural family?

The good news is, EEMBC anticipated most of those complications long ago, when ULPmark was created. Lab technicians striving for the ultimate ULPmark score were free to use whatever power-saving measures a chip might provide. And why not? That’s why they’re there. Use ’em if you’ve got ’em. But you had to document the conditions and the configuration, and if you wanted EEMBC to officially certify your results, they had to be reproducible. No miraculous one-off scores permitted; no exaggerated claims.

The bad news is, there were still some loopholes. ULPmark measures power consumption, while EEMBC’s classic CoreMark benchmark gauges performance – but there is no correlation between the two. On the surface, that seems natural enough. They’re two different tests, after all. But, in practice, it led to customer confusion. Everyone kinda, sorta understood that published ULPmark scores represented one extreme corner case (low power, low performance) while CoreMark scores represented the diagonally opposite corner. Both are useful, but both are just floating data points.

Tempting as it is, you can’t simply draw a line between those two points and naively assume that your chip’s power/performance characteristics lie anywhere along that line. Performance is rarely linear, and power consumption never is. Without laboriously testing and plotting every step between the two extremes, there’s no way to know what the curve/line/wiggle/Brownian motion for your chip really looks like.

EEMBC’s newest version of ULPmark, called ULPmark-CM, fixes some of that. It doesn’t change the code of either benchmark by very much, but it does alter the reporting rules. Henceforth, if you report power numbers using ULPmark, you must also report the CoreMark performance numbers that go with it, measured with the exact same hardware and software configuration. At last, we have two benchmarks that are correlated! A report of x iterations per millijoule (the way ULPmark has always worked) comes bundled with a score of y CoreMarks per second under the same conditions.

Crucially, this provides an x to go with the y on a power/performance graph. Before, an MCU’s power and performance numbers could be (and often were) plotted on the same graph, even though they were largely unrelated. Now, it’s possible to build a proper x/y plot of a chip’s performance versus power needs.

That still requires multiple data points, of course, and EEMBC has always encouraged testers to benchmark their devices at multiple points along that continuum. Marketing departments, however, tend to resist any attempt to fully characterize their devices, preferring to cherry-pick the one or two that make their MCU look good. That’s still the case; EEMBC can’t force anyone to benchmark their chips. It can only tighten up the rules for reporting the results.

This change is EEMBC’s first major public pronouncement since appointing its new president and CTO, Peter Torelli. Peter follows Markus Levy, who founded EEMBC almost single-handedly back in 1997 and has headed the nonprofit group ever since. Coincidentally, Peter joined Intel at about the same time that Markus left Intel to work as an editor at EDN and later, EEMBC.

Peter says there’d been a lot of pushback from both vendors and developers to “rationalize” benchmark scores. The trouble was, everyone had a different idea of what was rational. Oddly, the definition seemed to coincide with whatever would prove most flattering to their own product line. Some wanted to mandate maximum-speed tests; some wanted different low-power settings; some felt the midpoint would be most fair (midpoint of what?). He says the technical subcommittee at EEMBC seriously considered a three-point spread: one at maximum clock speed, one at minimum practical clock speed, and one at 3.0V supply voltage – this last one because 3.0V is the baseline for some of EEMBC’s other tests. The group felt it was a good engineering solution, but the marketing people hated it.

Ideally, of course, everyone wanted a single number, the One True Measure of Goodness, but that’s not how benchmarks work. In the end, EEMBC decided to let everyone test and report as many configurations as they like, just so long as they report the performance numbers that go with their power numbers. (EEMBC does not require the reverse; CoreMark performance scores don’t require corresponding power numbers.)

You can almost hear Peter Torelli roll his eyes as he talks about pressure from various groups petitioning him to create new benchmarks for this, or to tweak the benchmarks for that. (Markus Levy dealt with similar appeals for more than 20 years. It seems to go with the territory.) Right now, one subcommittee is working out the parameters for “typical” Wi-Fi activity for edge-node IoT devices. Good luck with that.

He also talks about the gradual shift in EEMBC’s customer base and usage. Early on, chip vendors gobbled up EEMBC benchmarks and published (selected) scores as proof of their technical superiority. They still do, but Peter says that, nowadays, internal engineering groups are their primary customers. Everyone understands that benchmarks are just the opening line of a novel; they never tell the whole story. Development teams will draw upon EEMBC’s broad suite of tests to do their own internal testing and decide what and how they want to punish different chips. “It’s more of an in-lab analysis tool and less of a pushbutton ‘give me a number’ tool,” he says. The engineer in him seems pleased at the change.

Leave a Reply

featured blogs
Sep 22, 2021
'μWaveRiders' 是ä¸ç³»åˆ—æ—¨å¨æŽ¢è®¨ Cadence AWR RF 产品的博客,按æˆæ›´æ–°ï¼Œå…¶å†…容涵盖 Cadence AWR Design Environment æ新的核心功能,专题视频ï¼...
Sep 22, 2021
3753 Cruithne is a Q-type, Aten asteroid in orbit around the Sun in 1:1 orbital resonance with the Earth, thereby making it a co-orbital object....
Sep 21, 2021
Learn how our high-performance FPGA prototyping tools enable RTL debug for chip validation teams, eliminating simulation/emulation during hardware debugging. The post High Debug Productivity Is the FPGA Prototyping Game Changer: Part 1 appeared first on From Silicon To Softw...
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

Silicon Lifecycle Management Paradigm Shift

Sponsored by Synopsys

An end-to-end platform solution, Silicon Lifecycle Management leverages existing, mature, world-class technologies within Synopsys. This exciting new concept will revolutionize the semiconductor industry and how we manage silicon design. For the first time, designers can look inside silicon chip devices from the moment the design is created to the point at which they end their life.

Click here to learn more about Silicon Lifecycle Management

featured paper

Detect. Sense. Control: Simplify building automation designs with MSP430™ MCU-based solutions

Sponsored by Texas Instruments

Building automation systems are critical not only to security, but worker comfort. Whether you need to detect, sense or control applications within your environment, the right MCU can make it easy. Using MSP430 MCUS with integrated analog, you can easily develop common building automation applications including motion detectors, touch keypads and e-locks, as well as video security cameras. Read more to see how you can enhance your building automation design.

Click to read more

featured chalk talk

Thunderbolt Technology Overview

Sponsored by Mouser Electronics and Intel

Thunderbolt is the closest thing we’ve got to universal interconnect between a wide variety of devices and systems. With a universal USB-C connector, it can do video, power, data communication - all at scalable rates with smart adjustment. In this episode of Chalk Talk, Amelia Dalton chats with Sandeep Vedanthi of Intel about the latest in Thunderbolt technology - Thunderbolt 4, which brings a number of benefits over previous versions.

Click here for more information about Intel 8000 series Thunderbolt™ 4 Controllers