feature article
Subscribe Now

A Mark on the Bench

EEMBC Benchmarks Scores Improve, But What Does it Mean?

Writing benchmarks is a lonely endeavor. It’s kind of like being a referee or an umpire. Everybody wants a good and fair benchmark, but “good” and “fair” are both open to interpretation, and whoever comes out on the short end of the evaluation is sure to howl and squeal.

The patient souls at EEMBC (Embedded Microprocessor Benchmark Consortium) have been dealing with this problem for well over a decade. They’ve produced a number of different benchmarks that measure any number of vital system parameters, all with the goal of helping programmers and engineers choose the best chip for their next project. Will that new Atmel chip be fast enough to do what you want? Does the latest AMD processor have enough oomph to get the job done? Can that little $2 part handle decryption in under a microsecond? There’s an EEMBC benchmark for that.

Perhaps the purest and simplest of these is the CoreMark benchmark. CoreMark is intended to test only the processor’s core: The internal CPU architecture, independent of on-chip memory, peripherals, or pin-out. As such, it’s not the most practical of benchmarks, but that’s the point. CoreMark is intended to be a CPU designer’s litmus test. It measures how quickly code sluices through the internal plumbing, not how fast the chip can wiggle its I/O lines. Anyone can download and run CoreMark, and anyone can post their scores. So far, there are 388 CoreMark scores posted at www.coremark.org.

Each score is listed both “straight up” (CoreMarks) and normalized for clock speed (CoreMarks/MHz). The former tells you how fast the chip is. The latter tells you how efficient it is. Engineering managers like the first number; nerds like the second number.

Which means we’ll be focusing here on the second number.

Before we do, it’s worth pointing out that the range of CoreMark scores is impressive. From an all-time low score of 0.37 to a high of over 336,000, the scores span almost six orders of magnitude. Naturally, the low scorers are little 8-bit microcontrollers that sell for next to nothing, while the top scorers are fire-breathing, massively multicore supercomputers-on-a-chip, such as IBM’s Power7. Between those two extremes, there’s probably a processor for every need and every budget, don’t you think?

When we look at CoreMark/MHz scores, however, the ranking is a bit different. Normalizing for clock frequency eliminates the “brute force” aspect of the test and drops some of the faster processors down the rankings a bit. Now, one could argue that that’s pointless: We don’t buy normalized processors, we buy real chips, and if that IBM chip runs faster than that Intel chip, so be it. We don’t normalize a car’s MPG rating by its weight, or a golfer’s handicap by his age. So CoreMark/MHz is more like a baseball player’s batting average. The hits (CoreMark) divided by the number of at-bats (clock cycles). In engineering terms, it’s a measure of the processor’s internal efficiency. How much work can it get done in a given clock cycle? The ratio is interesting in its own right, but it’s also a good window into the processor’s potential power consumption. The higher the CoreMark/MHz ratio, the more efficient the device and the less juice it should consume.

Here, too, we see a big spread of scores, from a low of 0.03 to a high of over 167. That’s more than four orders of magnitude, which seems surprising now that we’ve taken clock frequency out of the equation. That represents a difference of 5-thousand-to-1 in terms of CPU efficiency, which hardly seems possible. We don’t see cars with 5 thousand times better gas mileage than their peers, or baseball players that are 5 thousand times more likely to hit the ball (except maybe on the playground). Yet here we’ve got some pretty clear evidence that some processors are thousands of times more productive than others. What’s going on?

Quite a number of things, as it happens, and as you probably expected. For starters, the faster chips have multiple CPU cores, so they’re cheating, in the sense that they’re running multiple copies of the benchmark and combining their scores. But all’s fair in love and benchmarking, and double-teaming the benchmark code is actually a pretty fair representation of how the chip would behave in real life. It would be worse if the benchmark scores didn’t improve; what would be the point of multicore in that case?

Then there are compiler differences. CoreMark is delivered as C source code, so it has to be compiled. Any first-year computer programmer knows that the choice of compiler makes a big difference to the quality (and speed) of your code. Benchmark tests are no different. In fact, some benchmarks (like NullStone) are actually tests of the compiler, not of the hardware.

So, for instance, we’ve got one example of an NXP LPC2939 chip that delivers 0.54 CoreMark/MHz, and another example of the same chip that gets 1.18. That’s a 2:1 difference for the exact same device. The only difference between them? One runs out of flash memory and the other, RAM.

In other head-to-head examples, the cause of different benchmark scores is the compiler. Is that fair? Probably, since the compilers themselves are commercial products that you or I could use on our own projects. If Compiler A really does produce code that runs twice as fast as Compiler B’s, I know which one moves to the top of my shopping list.

Memory can play a difference, but only sometimes. Most low-end microcontrollers have enough on-chip memory to hold the little CoreMark test entirely within their confines. Faster 32-bit and 64-bit processors typically don’t have on-chip RAM, but they do have big caches. So even though the processor has to fetch the benchmark out of slow external memory, the program will run from the cache from then on, negating any difference in memory speed or latency. That might not be a good reflection of how your real code works (in fact, it’s probably not), but hey—a benchmark can only do so much.

Then there’s the human element. Some benchmark testers have more… enthusiasm for their job than others. Pretty much any benchmark is subject to unethical optimization, although CoreMark is pretty hack-proof. As far as the people running EEMBC can tell, none of the CoreMark scores published on the website are bogus. They even certify some of the scores (if the tester is a member of EEMBC) by duplicating the test in their own labs. Currently, only about 10% of CoreMark scores are certified; the rest are “self-certified.” Peer pressure keeps people from making egregious claims. A wildly out-of-whack score would quickly be challenged by competitors, as well as current customers. “Hey, how come the chip you sold me doesn’t go that fast?”

Finally, there’s good old-fashioned progress. CoreMark scores have been improving over time, some by as much as 30–40% in a single year. It’s not entirely clear where that gain is coming from. I spoke with a number of EEMBC members who’d recently posted upgraded CoreMark scores, and their general response was, “we’re always striving to improve our compiler tools to better serve our valued customers,” blah, blah, blah. No improvements to the chips, in other words. Just improvements in the compiler or—and this is a real possibility—a better understanding of which compiler switches produce the most flattering results.

And this brings us to the final paradox of benchmark testing. In real-world development, the compiler setting that gives you the best benchmark score isn’t always the one you really want. Benchmarks like CoreMark test speed, and nothing else. Specifically, they don’t grade code density or code size, nor do they measure power consumption or debug accessibility. The hardware and software setup that produces the fastest code may also produce a gigantic binary image that hogs RAM, ROM, and power. Sports cars that post astonishing 0–60 times rarely have big trunks for hauling groceries and other practical uses.

So we’re back to where we started, rating and ranking chips based on our own personal mix of different criteria, most of them subjective. As nice as it seems to have a single, objective number to measure “goodness,” no number is going to tell you what you want. That’s what engineers are for. 

6 thoughts on “A Mark on the Bench”

  1. Pingback: DMPK Studies
  2. Pingback: IN Vitro ADME

Leave a Reply

featured blogs
Feb 28, 2021
Using Cadence ® Specman ® Elite macros lets you extend the e language '”€ i.e. invent your own syntax. Today, every verification environment contains multiple macros. Some are simple '€œsyntax... [[ Click on the title to access the full blog on the Cadence Comm...
Feb 27, 2021
New Edge Rate High Speed Connector Set Is Micro, Rugged Years ago, while hiking the Colorado River Trail in Rocky Mountain National Park with my two sons, the older one found a really nice Swiss Army Knife. By “really nice” I mean it was one of those big knives wi...
Feb 26, 2021
OMG! Three 32-bit processor cores each running at 300 MHz, each with its own floating-point unit (FPU), and each with more memory than you than throw a stick at!...

featured video

Designing your own Processor with ASIP Designer

Sponsored by Synopsys

Designing your own processor is time-consuming and resource intensive, and it used to be limited to a few experts. But Synopsys’ ASIP Designer tool allows you to design your own specialized processor within your deadline and budget. Watch this video to learn more.

Click here for more information

featured paper

Using the DS28E18, The Basics

Sponsored by Maxim Integrated

This application note goes over the basics of using the DS28E18 1-Wire® to I2C/SPI Bridge with Command Sequencer and discusses the steps to get it up and running quickly. It then shows how to use the device with two different devices. The first device is an I2C humidity/temperature sensor and the second one is an SPI temperature sensor device. It concludes with detailed logs of each command.

Click here to download the whitepaper

featured chalk talk

PCI Express: An Interconnect Perspective

Sponsored by Samtec

New advances in PCIe demand new connectors, and the challenge of maintaining signal integrity has only gotten tougher. In this episode of Chalk Talk, Amelia Dalton chats with Matthew Burns of Samtec about why PCIe isn’t just for PCs, what PCIe over fiber looks like, and how Samtec’s team of signal integrity experts can help with your next PCIe design.

Click here for more information about PCI Express® Solutions from Samtec