feature article
Subscribe Now

Faster Floating-Point

Altera Smooths Path to Floating-Point FPGA

We’ve done dozens of articles about how awesome FPGAs are for signal processing applications – with a measure of salt.  We’ve pointed and laughed as FPGA vendors boasted of their gaggles of GMACs that nobody would ever realize with a practical DSP design.  We raised an eyebrow when they told us how easy their DSP design flow for FPGAs was – heck, even a software guy could do it. Not. We even scrutinized (with suspicion) their high-level synthesis methodologies, and were typically less than flabbergasted at the complexity that sat right beneath the surface. 

Over time, however, DSP-on-FPGA has become a pretty well-worn and successful path.  Even DSP processor stalwarts like BDTi gave high praise to the capabilities of FPGAs as DSP machines – capable of much higher throughputs on dramatically lower power budgets, and codable with comparable effort – when compared to the more complex software-programmed DSPs.  It seemed that FPGAs were earning their stripes as go-to devices for tough signal processing applications.

Except for one thing.

Hidden in the fine print was the double-asterisk footnote that said that you really needed to convert your algorithm to fixed-point to be able to behold the beauty of FPGA fantasticness.  That’s no problem, right?  It just requires you to find the appropriate… huh, dang, maybe it’s not so simple after all.  In fact, there have even been start-ups whose whole charter was to develop software tools to assist in converting algorithms from software-esque floating-point to hardware-friendly fixed-point implementations – and doing the complicated analysis to see what fidelity you lost in the process. 

For many designs, the complexity of adapting them to fixed-point was rewarded with blazing fast speeds and incomparable power efficiency.  The big rewards on the other side of the quantization chasm were enough to lure designers into taking the plunge, doing the big math, and figuring out where to put the decimal point.  Some designs, however, don’t lend themselves to floating-to-fixed-point transformation – no matter how badly we want to use FPGAs.  Algorithms like those in linear algebra – that require high dynamic range and are extra-sensitive to the types of errors introduced by quantization – really need floating-point implementations to work properly.

“Never fear,” said the FPGA companies.  “You can do floating point with our FPGAs, too!  Just, uh, take the hard-wired multipliers and add some exponent manipulation to the side using the FPGA fabric and work out the tricky bits with the design flow, and uh, uh-oh,… Actually, the details are left as an exercise for the design team.  Good luck!”

“Good luck” is what you would need, too – because getting floating-point working on most FPGAs is a Rube-Goldbergian exercise in non-linear engineering.  The FPGAs and the IP were not really designed with floating point in mind, and that fact becomes painfully clear in both the ease of implementation and the performance.  

Altera has changed all that, however.  We wrote just over a year ago that they had done some serious remodeling on their DSP blocks – featuring a change from the venerable 18×18 multiplier to a more versatile variable-precision DSP block.  That enhancement, it turns out, was just the first shoe to drop.  Most DSP-savvy folk probably guessed that the next thing to come down the pike would be full-fledged floating-point support.  Good guess, DSP-savvy folk! 

Now, Altera has done the heavy lifting for us and, as a bonus, BDTi has even evaluated the resulting design flow and hardware implementations – with a pretty big thumbs-up.  Floating-point math using FPGAs is now a practical reality.  Altera’s approach was to add floating-point blocks to their DSP Builder Advanced Blockset with a comprehensive design and verification flow built around them.  Altera’s approach is similar to other model-based DSP design flows – allowing blocks to be stitched together in tools like Simulink from Mathworks.  However, Altera goes one (important) step further in optimizing the datapath across multiple blocks, eliminating much of the overhead associated with block-based algorithm design.  As the datapath is assembled, Altera’s tool automatically chooses the level of normalization required to match up the exponents from one stage to the next.  The result is a significant reduction in the extra hardware required for normalization and de-normalization. 

BDTi’s evaluation of the Altera flow was done with production tools and hardware, and it consisted of the implementation and evaluation of a Cholesky solver, which “finds the inverse of a Hermitian positive definite matrix to solve for the vector x in a simultaneous set of linear equations of the form Ax = B.”  If you’re not current on your Hermitian positive definitive matrices – it’s a decent example for evaluating a floating-point design flow and the resulting hardware performance.  The target FPGA was an Altera Stratix IV (not even the latest-generation Stratix V), and BDTi was able to get outstanding results (our words, not theirs), both from the design flow and from the hardware.  You can read the full BDTi report in a white paper here.

There has always been a gap between the design effort and the complexity for implementing a complex, performance-demanding algorithm in a conventional processor like a DSP versus implementing that same algorithm in FPGA hardware.  In the old days, we characterized the difference as “10x the design effort for 10x the performance.”  However, that gap has closed significantly in recent years.  For one thing, getting the most performance out of a modern DSP processor requires coding skill and knowledge of the underlying hardware that far exceeds “normal” software engineering skill.  As DSP processors have gotten more complex, getting the most performance out of them has become more complex as well.  FPGAs, on the other hand, have gotten continuously easier to use for DSP design.  With the big signal-processing market out there just beckoning, the FPGA companies (as well as 3rd-party tool suppliers) have worked hard to take the pain out of the hardware-intensive design process for DSP-on-FPGA implementations.  The latest step in that evolution could prove to be one of the biggest, as it could remove that last nagging footnote that says “fixed-point only” from the FPGA brag-sheet for FPGA DSP performance. 

Altera says their floating-point DSP flow is available now in their standard tools and current FPGA products.  Further enhancements will follow with future versions of the tool flow and with the just-hitting-the-market 28nm FPGAs.

Leave a Reply

featured blogs
Nov 23, 2020
It'€™s been a long time since I performed Karnaugh map minimizations by hand. As a result, on my first pass, I missed a couple of obvious optimizations....
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...
Nov 20, 2020
[From the last episode: We looked at neuromorphic machine learning, which is intended to act more like the brain does.] Our last topic to cover on learning (ML) is about training. We talked about supervised learning, which means we'€™re training a model based on a bunch of ...
Nov 20, 2020
Are you a lab instructor sitting at home right now? Have you completed some Cadence Online Training courses for your education and earned Digital Badges for personal promotion and spicing up your CV... [[ Click on the title to access the full blog on the Cadence Community si...

Featured video

Synopsys and Intel Full System PCIe 5.0 Interoperability Success

Sponsored by Synopsys

This video demonstrates industry's first successful system-level PCI Express (PCIe) 5.0 interoperability between the Synopsys DesignWare Controller and PHY IP for PCIe 5.0 and Intel Xeon Scalable processor (codename Sapphire Rapids). The ecosystem can use the companies' proven solutions to accelerate development of their PCIe 5.0-based products in high-performance computing and AI applications.

More information about DesignWare IP Solutions for PCI Express

featured paper

Keys to quick success using high-speed data converters

Sponsored by Texas Instruments

Whether you’re designing an aerospace system, test and measurement equipment or automotive lidar AFE, hardware designers using high-speed data converters face tough challenges with high-frequency inputs, outputs, clock rates and digital interface. Issues might include connecting with your field-programmable gate array, being confident that your first design pass will work or determining how to best model the system before building it. In this article, we take a look at each of these challenges.

Click here to download the whitepaper

featured chalk talk

Using the Graphical PMSM FOC Component in Harmony3

Sponsored by Microchip and Mouser Electronics

Developing embedded software, and particularly configuring your embedded system can be a major pain for development engineers. Getting all the drivers, middleware, and libraries you need set up and in the right place and working is a constant source of frustration. In this episode of Chak Talk, Amelia Dalton chats with Brett Novak of Microchip about Microchip’s MPLAB Harmony 3, with the MPLAB Harmony Configurator - an embedded development framework with a drag-and-drop GUI that makes configuration a snap.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment (IDE)