feature article
Subscribe Now

EEMBC Polishes Ye Olde Whetstone

New Embedded Benchmark Quantifies Floating-point Performance

To paraphrase Mark Twain, there are lies, damn lies, and benchmarks. People have been fudging their benchmark results for as long as there have been benchmarks. It’s easy enough to do. Indeed, it’s surprisingly hard not to distort benchmark results, even for the most scrupulously honest engineers. Measuring CPU performance ain’t like drag racing cars, and any comparison of benchmark scores inevitably boils down to an argument over what, specifically, you’re measuring.

Wading into this morass is EEMBC, the nonprofit organization that threw itself on the benchmarking grenade almost 20 years ago. EEMBC (which stands for Embedded Microprocessor Benchmark Consortium, but with a generous extra E) has produced a number of specialty benchmarks over the years. They have tests that measure real-time performance, multicore ability, automotive workloads, and much more. What the group hasn’t had until now is a straight-up floating-point benchmark.

Why now? ’Cause more and more embedded processors are including FPUs, and even low-end, sub-$5 chips now have floating-point capability. It’s not that these little MCUs need to perform scientific calculations or anything; the FP comes in handy mostly for motor control. As anyone who’s done robotics, kinematics, or motion control can tell you, accurate math is absolutely mandatory when you’re keeping track of the positions of things. The inevitable rounding errors that come with integer arithmetic (even when you’re using 32-bit numbers) add up surprisingly quickly, and within a few minutes you don’t really know where your robot arm is anymore. Floating-point math virtually eliminates those rounding errors and the scary imprecision.

Combine inexpensive processors, complex math, and real-time performance needs and you have a recipe for benchmark confusion. Until now, developers typically compared chips running either their own in-house code (which meant laying hands on the chip and porting the code), or running one of a handful of freely available FP benchmarks such as Linpak, Whetstone, or Livermore loops. Either technique might (or might not) provide a rough guide to which chips provide better FP performance than others, but neither method likely measured what you really wanted to know. Today’s FP benchmarks are really just inner loops: kernels of a larger algorithm that have been passed around from generation to generation as quick-and-dirty code samples. They’re not real benchmarks in the sense of being controlled, repeatable tests.

Moreover, nobody “owns” or controls Linpack, et al. That means you’re free to adjust the code as you see fit, as many have done. That’s fine for in-house tuning, but it does nothing to make these freebie FP nuggets useful as comparison tests. What’s needed is a fixed reference point: a benchmark that can be used to compare different chips running on different days using different compilers, and so on.

What EEMBC has done is to take the Whetstones of the world and fold them all into one bona fide benchmark suite that it oversees, called FPMark. The inner loops are pretty much the same Livermore loops that are already (ahem) floating around, but codified, sanitized, and “harnessed” in such a way that they’re impervious to ham-fisted or malicious tuning. In all, FPMark exercises 10 different floating-point algorithms (including some written from scratch just for FPMark) for a total of 53 different workloads.

Why so many different variations? Each kernel comes in both single-precision and double-precision versions, because some chips support only one or the other. Each kernel is also run through a small, medium, and large data set – again, because some chips can handle only smallish address ranges or data blocks. Most kernels get run through all six permutations, giving a good indication of how the chip performs on that task under all conditions.

What if you’re interested only in single-precision, small-dataset results? No problem. Programmers using $5 MCUs with lightweight FPUs can run just the Lilliputian modes, while their colleagues in the next building can exercise the full fury of a Xeon 5500 by running the entire test suite. The beauty is, it’s the same code either way, so the results are directly comparable.

Got a multicore and/or multi-threaded processor? FPMark has that covered, too. Assuming your compiler supports it, FPMark will run multiple instances on multiple virtual cores. Note that it’s not parallelized; the component tasks are not split up and vectorized. Instead, a full instance of FPMark is run concurrently on each core. This more closely represents how designers are likely to run their own FP code.

When it’s all over, FPMark takes all 53 results (or fewer, depending) and weights them equally. The geometric mean of the results becomes your FPMark score. If you’ve opted for the lightweight data sets, you get a MicroFPMark score. Individual scores are available, too, if you’re interested only in certain tasks or variations.

As with most EEMBC benchmarks, you’re free to publish your results to the world, with or without EEMBC’s approval. If you want the gold seal of respectability, EEMBC will verify and certify your results for free; you just have to provide the hardware/software setup you used and wait. Interestingly, EEMBC’s engineers have found that certifying vendors’ results doesn’t uncover cheating as often as you might think. In fact, EEMBC gets better scores than the vendor did about half the time. That’s usually due to a change in the compiler between the time the vendor ran the tests and when EEMBC made its own run. Or it’s simple incompetence. Vendors may give the job of benchmarking to some inexperienced junior intern who lacks experience in compiler tweaking or who doesn’t understand the finer points of memory allocation. Either way, an EEMBC-certified result is everyone’s guarantee that yes, this chip can do these tasks that fast.

As with most benchmarks, FPMark says as much about the compiler and software tools as it does about the chip. Software vendors are often just as interested in benchmarks as chip makers, and a few have already discovered shortcomings in their floating-point code. Even if you’re not interested in drag racing processors, it might be worth running FPMark just to see how well your compiler stacks up. 

Leave a Reply

featured blogs
Oct 5, 2022
The newest version of Fine Marine - Cadence's CFD software specifically designed for Marine Engineers and Naval Architects - is out now. Discover re-conceptualized wave generation, drastically expanding the range of waves and the accuracy of the modeling and advanced pos...
Oct 4, 2022
We share 6 key advantages of cloud-based IC hardware design tools, including enhanced scalability, security, and access to AI-enabled EDA tools. The post 6 Reasons to Leverage IC Hardware Development in the Cloud appeared first on From Silicon To Software....
Sep 30, 2022
When I wrote my book 'Bebop to the Boolean Boogie,' it was certainly not my intention to lead 6-year-old boys astray....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Hot-Swap and Power Protection -- Mouser Electronics and Analog Devices

Sponsored by Mouser Electronics and Analog Devices

When it comes to our always-on, critical systems we need to carefully consider power protection and maintainability. In this episode of Chalk Talk, Amelia Dalton and Dwight Larson investigate the issues that surround hot-plugging into an energized power supply, the best solutions to consider, what the different hot-swap circuit topologies look like for a variety of applications and why the MAX15090B/C with its innovative current foldback startup may be the best solution for your next design.

Click here for more information about Maxim Integrated MAX15090B/MAX15090C Hot Swap ICs