feature article
Subscribe Now

EEMBC Polishes Ye Olde Whetstone

New Embedded Benchmark Quantifies Floating-point Performance

To paraphrase Mark Twain, there are lies, damn lies, and benchmarks. People have been fudging their benchmark results for as long as there have been benchmarks. It’s easy enough to do. Indeed, it’s surprisingly hard not to distort benchmark results, even for the most scrupulously honest engineers. Measuring CPU performance ain’t like drag racing cars, and any comparison of benchmark scores inevitably boils down to an argument over what, specifically, you’re measuring.

Wading into this morass is EEMBC, the nonprofit organization that threw itself on the benchmarking grenade almost 20 years ago. EEMBC (which stands for Embedded Microprocessor Benchmark Consortium, but with a generous extra E) has produced a number of specialty benchmarks over the years. They have tests that measure real-time performance, multicore ability, automotive workloads, and much more. What the group hasn’t had until now is a straight-up floating-point benchmark.

Why now? ’Cause more and more embedded processors are including FPUs, and even low-end, sub-$5 chips now have floating-point capability. It’s not that these little MCUs need to perform scientific calculations or anything; the FP comes in handy mostly for motor control. As anyone who’s done robotics, kinematics, or motion control can tell you, accurate math is absolutely mandatory when you’re keeping track of the positions of things. The inevitable rounding errors that come with integer arithmetic (even when you’re using 32-bit numbers) add up surprisingly quickly, and within a few minutes you don’t really know where your robot arm is anymore. Floating-point math virtually eliminates those rounding errors and the scary imprecision.

Combine inexpensive processors, complex math, and real-time performance needs and you have a recipe for benchmark confusion. Until now, developers typically compared chips running either their own in-house code (which meant laying hands on the chip and porting the code), or running one of a handful of freely available FP benchmarks such as Linpak, Whetstone, or Livermore loops. Either technique might (or might not) provide a rough guide to which chips provide better FP performance than others, but neither method likely measured what you really wanted to know. Today’s FP benchmarks are really just inner loops: kernels of a larger algorithm that have been passed around from generation to generation as quick-and-dirty code samples. They’re not real benchmarks in the sense of being controlled, repeatable tests.

Moreover, nobody “owns” or controls Linpack, et al. That means you’re free to adjust the code as you see fit, as many have done. That’s fine for in-house tuning, but it does nothing to make these freebie FP nuggets useful as comparison tests. What’s needed is a fixed reference point: a benchmark that can be used to compare different chips running on different days using different compilers, and so on.

What EEMBC has done is to take the Whetstones of the world and fold them all into one bona fide benchmark suite that it oversees, called FPMark. The inner loops are pretty much the same Livermore loops that are already (ahem) floating around, but codified, sanitized, and “harnessed” in such a way that they’re impervious to ham-fisted or malicious tuning. In all, FPMark exercises 10 different floating-point algorithms (including some written from scratch just for FPMark) for a total of 53 different workloads.

Why so many different variations? Each kernel comes in both single-precision and double-precision versions, because some chips support only one or the other. Each kernel is also run through a small, medium, and large data set – again, because some chips can handle only smallish address ranges or data blocks. Most kernels get run through all six permutations, giving a good indication of how the chip performs on that task under all conditions.

What if you’re interested only in single-precision, small-dataset results? No problem. Programmers using $5 MCUs with lightweight FPUs can run just the Lilliputian modes, while their colleagues in the next building can exercise the full fury of a Xeon 5500 by running the entire test suite. The beauty is, it’s the same code either way, so the results are directly comparable.

Got a multicore and/or multi-threaded processor? FPMark has that covered, too. Assuming your compiler supports it, FPMark will run multiple instances on multiple virtual cores. Note that it’s not parallelized; the component tasks are not split up and vectorized. Instead, a full instance of FPMark is run concurrently on each core. This more closely represents how designers are likely to run their own FP code.

When it’s all over, FPMark takes all 53 results (or fewer, depending) and weights them equally. The geometric mean of the results becomes your FPMark score. If you’ve opted for the lightweight data sets, you get a MicroFPMark score. Individual scores are available, too, if you’re interested only in certain tasks or variations.

As with most EEMBC benchmarks, you’re free to publish your results to the world, with or without EEMBC’s approval. If you want the gold seal of respectability, EEMBC will verify and certify your results for free; you just have to provide the hardware/software setup you used and wait. Interestingly, EEMBC’s engineers have found that certifying vendors’ results doesn’t uncover cheating as often as you might think. In fact, EEMBC gets better scores than the vendor did about half the time. That’s usually due to a change in the compiler between the time the vendor ran the tests and when EEMBC made its own run. Or it’s simple incompetence. Vendors may give the job of benchmarking to some inexperienced junior intern who lacks experience in compiler tweaking or who doesn’t understand the finer points of memory allocation. Either way, an EEMBC-certified result is everyone’s guarantee that yes, this chip can do these tasks that fast.

As with most benchmarks, FPMark says as much about the compiler and software tools as it does about the chip. Software vendors are often just as interested in benchmarks as chip makers, and a few have already discovered shortcomings in their floating-point code. Even if you’re not interested in drag racing processors, it might be worth running FPMark just to see how well your compiler stacks up. 

Leave a Reply

featured blogs
Jun 1, 2023
Cadence was a proud sponsor of the SEMINATEC 2023 conference, held at the University of Campinas in Brazil from March 29-31, 2023. This conference brings together industry representatives, academia, research and development centers, government organizations, and students to d...
Jun 1, 2023
In honor of Pride Month, members of our Synopsys PRIDE employee resource group (ERG) share thoughtful lessons on becoming an LGBTQIA+ ally and more. The post Pride Month 2023: Thoughtful Lessons from the Synopsys PRIDE ERG appeared first on New Horizons for Chip Design....
May 8, 2023
If you are planning on traveling to Turkey in the not-so-distant future, then I have a favor to ask....

featured video

Synopsys Solution for Comprehensive Low Power Verification

Sponsored by Synopsys

The growing complexity of power management in chips requires a holistic approach to UPF power-intent generation and low power verification. Learn how Synopsys addresses these requirements with a comprehensive solution for low-power verification.

Learn more about Synopsys’ Energy-Efficient SoCs Solutions

featured contest

Join the AI Generated Open-Source Silicon Design Challenge

Sponsored by Efabless

Get your AI-generated design manufactured ($9,750 value)! Enter the E-fabless open-source silicon design challenge. Use generative AI to create Verilog from natural language prompts, then implement your design using the Efabless chipIgnite platform - including an SoC template (Caravel) providing rapid chip-level integration, and an open-source RTL-to-GDS digital design flow (OpenLane). The winner gets their design manufactured by eFabless. Hurry, though - deadline is June 2!

Click here to enter!

featured chalk talk

High Voltage Stackable Dual Phase Constant On Time Controllers - Microchip and Mouser
Sponsored by Mouser Electronics and Microchip
In this episode of Chalk Talk, Chris Romano from Microchip and Amelia Dalton discuss the what, where, and how of Microchip’s high voltage stackable dual phase constant on time controllers. They investigate the stacking capabilities of the MIC2132 controller, how these controllers compare with other solutions on the market, and how you can take advantage of these solutions in your next design.
May 22, 2023
1,403 views