feature article
Subscribe Now

Re-inventing the DSP Block

Altera Changes the Game

Everything in technology changes – evolves – improves.

First, we had 8-bit processors, then 16, then 32… now many of us are tapping the keys on 64-bit devices.

Nothing stays still for very long.

Why, then, have we lived for around a decade with very little change to the garden-variety 18×18 multipliers in our hardened FPGA DSP blocks.  Except for a few minor improvements, those haven’t really progressed in years.  

To paraphrase something Bill Gates apparently never really said: “Why would anybody ever need more than 18×18 bit multiplication?”

OK, wait.  There has been some evolution in DSP blocks.  We’ve got from 18×18 multipliers to multiplier-accumulator-ALU-ish blocks with all kinds of fancy carry logic.  We’ve even got asymmetric multipliers that have a wider side to accommodate a few tougher problems.  Both of the major vendors have continued to improve their blocks in ways that allow more complex operations to be done without jumping out to the LUT fabric.  

Altera, however, has just raised the stakes a lot – with their complete re-design of the DSP block for their upcoming 28nm Stratix-V line.

For the tried-and-true sweet spot of FPGA, 18×18 multipliers were just fine, but with FPGA markets expanding into areas like medical imaging, wireless, mil/aero, and test and measurement, wider fixed- and floating-point multiplication is required to solve the real-world problems.  If your FPGA can support those operations in hard-wired logic, you can skip the LUT fabric altogether, improve your throughput and power consumption, and save the programmable fabric for other work – or, better yet, save the money by buying a smaller FPGA.

The key new element in Altera’s DSP is variable precision.  Instead of a fixed-width hardware multiplier, the company has introduced a fracturable/cascadable multiplier that can deliver a variety of bit widths very efficiently.  To avoid glossing our eyes over with the exhaustive list of every possible combination, we’ll just say that you can choose precision from 9X9 up to 54X54, including asymmetric settings, with very little wasted hardware.  Floating point mantissa multiplication is easily accomplished as well, so the enthusiasts of the relatively narrow area of FPGA-accelerated high-performance computing (or “reconfigurable computing”) will be very excited.  (You see, OpenFPGA.org?  Somebody IS listening.)

Back in the days of Stratix II, an Altera DSP block had four independent 18×18 multipliers (four 36-bit inputs).  For Stratix III, the company doubled the block and made it splittable (four 72-bit inputs), so that we could use “Half Blocks.”  The DSP block then could do eight 18×18 multiplications summing, or four 18×18 multiplications independently.  Now, the DSP block has four of the new variable-precision blocks (four 72-bit inputs), so the unit can do eight 18×18 multiplications summing, eight 18×18 multiplications independently, and high precision operations.  

The new block has two native modes – “18-bit” and “high-precision.”  In “18-bit” mode, two 18×18 products can be summed into a 64-bit accumulator (with 37-bit precision out of the adder), or two 18×18 products can be independently output with 32-bit product precision.  In “High-Precision” mode, you can do 27×27 multiplication with a 64-bit accumulator and 18×36 with a 64-bit accumulator.  This means you can do single-precision floating-point mantissa multiplication in one variable-precision DSP block.  The 64-bit accumulators allow for cascading without loss of precision. 

Altera lists many common applications where this all comes in handy.  For example, FFTs require high-precision complex multiplication.  The data width increases with each stage, while the coefficient remains the same, so we can go from 18×18 to 18×25 to 18×36.  With the new architecture, each of these can be done in a single block.  With previous-generation blocks, the number of DSP blocks required could double.

For floating-point precision, using the 64-bit cascade, a single-precision mantissa multiplication can be done with one block at 27×27, or a double-precision 54×54 can be implemented with four blocks cascaded.  Four blocks cascaded could do a single-precision floating-point FFT’s complex multiplication.

The combinations and permutations go on and on, of course.  Altera looked at a number of critical popular applications in designing the new blocks, and the net effect is that you’ll use far fewer of the new blocks to accomplish the same math, and far less often be required to take your critical timing path out of the hardened world of your DSP blocks and into the LUT fabric.

Will this benefit you?  The answer is – sometimes. 

If you’re doing arithmetic operations that require more than the fixed-point precision choices available currently, you’ll certainly be able to do them with fewer DSP blocks, and with fewer excursions into the LUT fabric.  That means you’ll have more options.  If the number of DSP blocks was the reason you had to buy that bigger FPGA, you can now buy a smaller one.  (Don’t tell Altera they just engineered themselves into a smaller sale.)

If you were pushing it on Fmax because some of your arithmetic logic was bleeding over into the LUTs, you may now be able to operate the multiplier-accumulator part of your datapath closer to the datasheet frequencies.  Or, you may have a lot less work to do on timing closure when you’re finishing up your design.

If you were resource sharing DSP blocks because you were limited in the number available, now you can go with more parallelism and potentially improve your throughput and/or latency.  This would, of course, also translate to less memory/register resources being used in the course of sharing magic.  

Another group certain to benefit from this architecture are those using high-level synthesis to come from algorithmic representations in C/C++ or other untimed high-level languages into FPGA hardware.  If your high-level synthesis tool has this more flexible block in its tool chest (and if it has the wherewithal to use it properly), you’ll magically get better results without even worrying about it.

As the marginal returns from each new process node continue to diminish, FPGA companies need to step up their architectural innovation to keep pace with our insatiable appetite for more performance and efficiency.  Smart advances like this new DSP block are exactly what FPGA users need and exactly what the FPGA industry needs to keep attracting new customers and new design wins in an increasingly competitive environment.

13 thoughts on “Re-inventing the DSP Block”

  1. Pingback: Chat
  2. Pingback: DMPK Studies
  3. Pingback: kari satilir
  4. Pingback: Bolide
  5. Pingback: bandar judi
  6. Pingback: Aws Colarts Diyala
  7. Pingback: beaubou.com
  8. Pingback: redirected here
  9. Pingback: Corporate Events
  10. Pingback: iraqi coehuman

Leave a Reply

featured blogs
May 26, 2022
Introducing Synopsys Learning Center, an online, on-demand library of self-paced training modules, webinars, and labs designed for both new & experienced users. The post New Synopsys Learning Center Makes Training Easier and More Accessible appeared first on From Silico...
May 26, 2022
CadenceLIVE Silicon Valley is back as an in-person event for 2022, in the Santa Clara Convention Center as usual. The event will take place on Wednesday, June 8 and Thursday, June 9. Vaccination You... ...
May 25, 2022
There are so many cool STEM (science, technology, engineering, and math) toys available these days, and I want them all!...
May 24, 2022
By Neel Natekar Radio frequency (RF) circuitry is an essential component of many of the critical applications we now rely… ...

featured video

EdgeQ Creates Big Connections with a Small Chip

Sponsored by Cadence Design Systems

Find out how EdgeQ delivered the world’s first 5G base station on a chip using Cadence’s logic simulation, digital implementation, timing and power signoff, synthesis, and physical verification signoff tools.

Click here for more information

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

KISSLING Products: Rugged and Reliable Solutions

Sponsored by Mouser Electronics and TE Connectivity

Rugged and reliable designs today have a specific set of design requirements that may not be found in other industries including robustness, durability, and the ability to resist harsh environments. In this episode of Chalk Talk, Amelia Dalton chats with Mark Dickson from TE Connectivity about the KISSLING product family which includes a wide variety of rugged and reliable solutions for your next design.

Click here for more information about TE Connectivity / KISSLING Ruggedized Switching Products