feature article
Subscribe Now

Dialing-in DSP on FPGA

Catapult Customized for Altera

We’ve discussed the amazing potential FPGAs bring to DSP acceleration for years now.  We’re not alone, either.  FPGA vendors have pumped out trumped up performance specifications with dizzying claims as to the number of GMACs (Giga-Multiply-Accumulates per second) their hardware could execute.  So dizzying, in fact, that most of the potential customers got vertigo and fell to the floor without buying any FPGAs.

This was a problem for FPGA vendors – who quickly hooked up probes to the unconscious DSP dudes, downloaded their issues through virtual JTAG ports, and found out (among other things) that whipping out a few lines of algorithmic software for a DSP processor was a whole different ballgame from going back to school to learn enough about datapath microarchitectures to design one of the highly-parallel, heavily-pipelined, carefully-timed creations in VHDL or Verilog that would actually bring any reasonable percentage of those GMACs to life.

If you watch the whole thing in slow motion (using our high-frame-rate HD resolution digital camera with both Stratix III AND Virtex 5 devices processing the video in real time using all of their embedded DSP blocks simultaneously… Oh wait, that’s the marketing pitch), you’d see that the FPGA vendors got those outrageous GMACs numbers by simply multiplying the number of multipliers on their device by the maximum frequency at which they could be clocked.  Nothing in the real world will ever, ever, ever even come close to that performance with those devices. 

This small marketing miscue, however, has nothing to do with the problem.  It turns out that many DSP designers would be perfectly content with only 10-50X the performance they got with a traditional DSP (not the 1000X or so some of the GMACs numbers might lead one to believe).  The real issue was the designer expertise required to do the FPGA design and the fear factor faced by project teams in picking up that gauntlet – even in hopes of enormous performance gains.

Over in the ASIC world, however, it turns out that the EDA industry had been busy working on the same problem.  Instead of licensing a DSP core for your next system-on-chip, you could get much better performance (usually at lower cost and power) by designing a chunk of custom hardware for your specific algorithm.  Even in the lofty world of SoC ASIC design, they don’t like hand-crafting complex software algorithms into even more complex parallel hardware architectures, so some impressive tools were developed that could analyze an algorithm specified in a sequential, procedural language (like C, for example) and create a highly-optimized, parallelized, pipelined, ready-to-synthesize RTL microarchitecture that would put that algorithm into practice.

The head of the class in those tools is Mentor Graphics’s Catapult C Synthesis.  High-end ASIC design teams snapped up Catapult in droves, despite its hefty price tag, because they gained enormous productivity benefits from taking algorithms directly from C or C++ to hardware with performance that matched or even bettered what they got from months of hand-crafting RTL architectures.  In the ASIC world, however, projects are large, long and expensive.  Saving a few engineering-months of converting algorithm to architecture was still peanuts compared with the massive cost and schedule impact of creating and verifying a huge, costly, small-geometry SoC ASIC.  ASIC design teams prized Catapult as much for the flexibility it offered as for the productivity gains.  Need a smaller area solution with a longer latency?  Just press a few buttons.  Suddenly found out that you need to crunch data twice as fast?  Press a few more and your architecture is completely re-jiggered.  Need a different interface at the borders?  Catapult can handle it for you – retiming all the register-to-register logic as it goes along.

In the FPGA space, the benefits of a tool like Catapult C align perfectly with the value proposition of the FPGA itself – getting a complex design to market in absolute minimal time and retaining the flexibility inherent in the programmable logic platform.  Unfortunately, all the work to make Catapult give spectacular results in ASIC land fall apart when the tool starts stitching multipliers together out of LUTs while optimized hard-core Multiply Accumulate (MAC) units sit idle on the same chip.

This week, Altera and Mentor Graphics announced a collaboration that brings the benefits of high-powered algorithm-to-architecture technology like Catapult C to the FPGA community – in a way that takes advantage of the unique capabilities of both technologies.  The two companies have worked together to produce optimized Catapult libraries specifically for Altera’s FPGAs that allow the tool to understand and infer the high-performance resources already built into the FPGA.  Without these libraries, some of the most powerful design tools in the world (Catapult) and the some of the most powerful compute acceleration hardware in existence (Altera’s Stratix III FPGAs) just didn’t play nicely together.  You’d get results, but those results would fall far short of what was possible.

The two companies claim an average of 50-80% “DSP Fmax performance improvement” with the advent of the new libraries.  We’ll take exception to the use of Fmax as a DSP throughput metric, but the sentiment (and the hardware behind it) is what counts here.  Clearly, if you have a DSP algorithm that takes 50 multipliers, and you move those multipliers from LUT fabric to hard-wired, optimized multiply-accumulate devices sitting right on the chip, you gain significant performance, save a bundle of power, and free up all those expensive LUTs for other tasks.  For people targeting Altera devices with DSP algorithms, this is an enormous step forward in performance, cost, and power.

Using the Catapult C tool will not turn a software engineer into a hardware expert.  Catapult is not a fully-automatic algorithm-to-hardware converter.  It is, instead, a power tool that can assist someone with some hardware design knowledge in very quickly finding an optimal hardware implementation of an algorithm that meets any arbitrary tradeoff between area, power, and performance.  It also affords incredible flexibility in changing that architecture almost on-the-fly as design demands shift during (or after) the project. 

If you’re looking for a tool that’s typical of the “near-free” FPGA norm, be prepared for some serious sticker shock.  Catapult C is as expensive as a ground-breaking productivity tool should be.  Mentor claims that Catapult gives a 4-20X productivity advantage over hand-coding RTL for complex algorithms.  Unless your engineering time is very inexpensive, that kind of productivity boost can pay for even a very expensive tool on the first project.

The benefits of high-level design abstraction on the design process go far beyond productivity, however.  The combination of Catapult’s ability to fine-tune the hardware architecture to exactly balance area, performance, and power demands combined with the FPGAs time-to-market and flexibility advantages make something that, if you squint your eyes a little…  starts to approach the implementation flexibility of traditional DSP processors with orders of magnitude better performance and power consumption.

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

BMP585: Robust Barometric Pressure Sensor
In this episode of Chalk Talk, Amelia Dalton and Dr. Thomas Block from Bosch Sensortec investigate the benefits of barometric pressure sensors for a variety of electronic designs. They examine how the ultra-low power consumption, excellent accuracy and suitability for use in harsh environments can make Bosch’s BMP585 barometric pressure sensors a great fit for your next design.
Oct 2, 2023
26,222 views