feature article
Subscribe Now

EEMBC Polishes Ye Olde Whetstone

New Embedded Benchmark Quantifies Floating-point Performance

To paraphrase Mark Twain, there are lies, damn lies, and benchmarks. People have been fudging their benchmark results for as long as there have been benchmarks. It’s easy enough to do. Indeed, it’s surprisingly hard not to distort benchmark results, even for the most scrupulously honest engineers. Measuring CPU performance ain’t like drag racing cars, and any comparison of benchmark scores inevitably boils down to an argument over what, specifically, you’re measuring.

Wading into this morass is EEMBC, the nonprofit organization that threw itself on the benchmarking grenade almost 20 years ago. EEMBC (which stands for Embedded Microprocessor Benchmark Consortium, but with a generous extra E) has produced a number of specialty benchmarks over the years. They have tests that measure real-time performance, multicore ability, automotive workloads, and much more. What the group hasn’t had until now is a straight-up floating-point benchmark.

Why now? ’Cause more and more embedded processors are including FPUs, and even low-end, sub-$5 chips now have floating-point capability. It’s not that these little MCUs need to perform scientific calculations or anything; the FP comes in handy mostly for motor control. As anyone who’s done robotics, kinematics, or motion control can tell you, accurate math is absolutely mandatory when you’re keeping track of the positions of things. The inevitable rounding errors that come with integer arithmetic (even when you’re using 32-bit numbers) add up surprisingly quickly, and within a few minutes you don’t really know where your robot arm is anymore. Floating-point math virtually eliminates those rounding errors and the scary imprecision.

Combine inexpensive processors, complex math, and real-time performance needs and you have a recipe for benchmark confusion. Until now, developers typically compared chips running either their own in-house code (which meant laying hands on the chip and porting the code), or running one of a handful of freely available FP benchmarks such as Linpak, Whetstone, or Livermore loops. Either technique might (or might not) provide a rough guide to which chips provide better FP performance than others, but neither method likely measured what you really wanted to know. Today’s FP benchmarks are really just inner loops: kernels of a larger algorithm that have been passed around from generation to generation as quick-and-dirty code samples. They’re not real benchmarks in the sense of being controlled, repeatable tests.

Moreover, nobody “owns” or controls Linpack, et al. That means you’re free to adjust the code as you see fit, as many have done. That’s fine for in-house tuning, but it does nothing to make these freebie FP nuggets useful as comparison tests. What’s needed is a fixed reference point: a benchmark that can be used to compare different chips running on different days using different compilers, and so on.

What EEMBC has done is to take the Whetstones of the world and fold them all into one bona fide benchmark suite that it oversees, called FPMark. The inner loops are pretty much the same Livermore loops that are already (ahem) floating around, but codified, sanitized, and “harnessed” in such a way that they’re impervious to ham-fisted or malicious tuning. In all, FPMark exercises 10 different floating-point algorithms (including some written from scratch just for FPMark) for a total of 53 different workloads.

Why so many different variations? Each kernel comes in both single-precision and double-precision versions, because some chips support only one or the other. Each kernel is also run through a small, medium, and large data set – again, because some chips can handle only smallish address ranges or data blocks. Most kernels get run through all six permutations, giving a good indication of how the chip performs on that task under all conditions.

What if you’re interested only in single-precision, small-dataset results? No problem. Programmers using $5 MCUs with lightweight FPUs can run just the Lilliputian modes, while their colleagues in the next building can exercise the full fury of a Xeon 5500 by running the entire test suite. The beauty is, it’s the same code either way, so the results are directly comparable.

Got a multicore and/or multi-threaded processor? FPMark has that covered, too. Assuming your compiler supports it, FPMark will run multiple instances on multiple virtual cores. Note that it’s not parallelized; the component tasks are not split up and vectorized. Instead, a full instance of FPMark is run concurrently on each core. This more closely represents how designers are likely to run their own FP code.

When it’s all over, FPMark takes all 53 results (or fewer, depending) and weights them equally. The geometric mean of the results becomes your FPMark score. If you’ve opted for the lightweight data sets, you get a MicroFPMark score. Individual scores are available, too, if you’re interested only in certain tasks or variations.

As with most EEMBC benchmarks, you’re free to publish your results to the world, with or without EEMBC’s approval. If you want the gold seal of respectability, EEMBC will verify and certify your results for free; you just have to provide the hardware/software setup you used and wait. Interestingly, EEMBC’s engineers have found that certifying vendors’ results doesn’t uncover cheating as often as you might think. In fact, EEMBC gets better scores than the vendor did about half the time. That’s usually due to a change in the compiler between the time the vendor ran the tests and when EEMBC made its own run. Or it’s simple incompetence. Vendors may give the job of benchmarking to some inexperienced junior intern who lacks experience in compiler tweaking or who doesn’t understand the finer points of memory allocation. Either way, an EEMBC-certified result is everyone’s guarantee that yes, this chip can do these tasks that fast.

As with most benchmarks, FPMark says as much about the compiler and software tools as it does about the chip. Software vendors are often just as interested in benchmarks as chip makers, and a few have already discovered shortcomings in their floating-point code. Even if you’re not interested in drag racing processors, it might be worth running FPMark just to see how well your compiler stacks up. 

Leave a Reply

featured blogs
May 21, 2022
May is Asian American and Pacific Islander (AAPI) Heritage Month. We would like to spotlight some of our incredible AAPI-identifying employees to celebrate. We recognize the important influence that... ...
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Intel® Agilex™ M-Series with HBM2e Technology

Sponsored by Intel

Intel expands the Intel® Agilex™ FPGA product offering with M-Series devices equipped with high fabric densities, in-package HBM2e memory, and DDR5 interfaces for high-memory bandwidth applications.

Learn more about the Intel® Agilex™ M-Series

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Accelerating Innovation at the Edge with Xilinx Adaptive System on Modules

Sponsored by Xilinx

The combination of system-on-module technology with advanced SoCs with programmable logic offer the ultimate in functionality, performance, flexibility, power efficiency, and ease of use. In this episode of Chalk Talk, Amelia Dalton chats with Karan Kantharia of Xilinx about the new Kira SOM, and how it enables faster time-to-deployment versus conventional component-based design.

Click here for more information about Kria Adaptive System-on-Modules