feature article
Subscribe Now

The New DSP

Are FPGAs Really It?

On the fading footsteps of the fury of the Supercomputing conference, our minds typically whirl on the world of accelerated computation.  We picture powerful systems based on elegant devices that crunch through complex calculations at an almost inconceivable speed.  When we visualize that, of course, we don’t always think about the fact that the “sea level” of compute power is in an ever-increasing tide surge.  As the common desktop computer climbs ever higher in compute performance, many problems and applications that were once the purview of supercomputers have been conquered by the common.  Even some applications previously considered infeasible have now been well handled by the consumer-grade laptop’s compute capabilities.

All this commoditization of computing power has narrowed and pushed the field of applications requiring acceleration in interesting directions.  Not that many years ago, digital video processing was almost a pipe dream.  Now, for standard definition at least, that’s a solved and simple problem – even for the most pedestrian computing equipment.  The list goes on and on, but the killer applications of yesteryear – including the mass market dynamic dualism of video and audio — have mostly fallen by the wayside.  Now, however, those applications have found a second wind as some of the best compute power hogs of this decade.  Thanks to our desire for more channels of audio and more pixels of video in our entertainment systems, and thanks to our newfound desire to do real-time analysis of video streams from various sources, we have enough number crunching to keep ourselves busy creating cool computing hardware for at least the next few years.

For the most part, these applications have been aligned under the digital signal processing (DSP) banner.  As a group, DSP algorithms are certainly typified by an increased computing appetite as well as data-dominated rather than control-dominated behavior.  Many yearas ago, the term “DSP” was hijacked by specialized processors, “digital signal processors” (also DSPs) that were optimized for better performance in the DSP realm, forgoing the robust interrupt-handling features of conventional processors in favor of faster number crunching for data-centric applications.

As processing has evolved, however, a degree of ambiguity has arisen between the DSP and the conventional processor.  Conventional processors and DSPs have each grown many of the features of the other in an attempt to capture a broader share of the computing picture.  With each generation, the gap between the performance and capabilities of processor types seems to narrow, leaving DSPs in a shrinking window between applications that are within the capabilities of conventional processors and applications that require additional hardware acceleration or multiple DSPs to get the job done.

At the same time, FPGA companies have been spreading their wings and flying into new application domains. As we’ve discussed many times in the past few years, DSP was seen as a perfect target for the hardware architecture of FPGAs.  Instead of shoe-horning a complex, parallel algorithm into a sequential machine like a traditional DSP, FPGAs could take the full breadth of the algorithm and deal with it directly.  Where the fanciest DSPs could apply maybe a couple of multipliers in parallel to a complex, parallel algorithm, an FPGA could push hundreds of multiplications in a single clock cycle.  Designers’ eyes lit up with gigaflops signs (if there were such things) as they visualized thousands of percent acceleration of their toughest DSP problems.

The FPGA design community was quick to respond with a round of first-generation tools that attempted to bring the power of FPGA-based acceleration to the DSP-designing masses.  While most of these efforts were primitive at best, there was significant progress in the FPGAs-for-DSP domain.  Seasoned FPGA designers needed only a little code from a DSP expert to manually craft something that could yield significant performance benefits, and some of the automated tools were found useful even by the more adventurous DSP types themselves.

The problem was, nobody got exactly what they wanted.  Despite Herculean efforts to streamline the design flow, doing a DSP in an FPGA is still considerably harder than doing the same algorithm using a DSP processor.  We described the tradeoff as 10X the performance for 10X the design effort.  The numbers may have been debatable, but the general conclusion wasn’t.  FPGAs were certainly not proven a suitable substitute for the common DSP. 

As we mentioned before, however, the common DSP was playing to smaller and smaller audiences because of the narrowing gap of suitable applications.  At the same time, FPGAs lowered the bar for people with applications requiring some sort of hardware acceleration.  If the problem was DSP acceleration alone, FPGAs all but supplanted ASICs as the technology of last resort, and they brought the pain and cost of that last resort down to the point where it was a feasible option for many more design teams.

Today, the question of what technology to use is rather simple.  If your algorithm can be handled by a general purpose processor such as an ARM- or MIPS- powered embedded system, chances are that’s the most cost-effective (and power-efficient) way to go.  If you require more processing power than that, you want to offload the DSP-like portion of your problem to an appropriately-sized DSP processor.  If you hit the top of the DSP range and start considering multiple DSPs to solve your problem, you are most likely better off going with an FPGA.  In fact, the software-defined radio (SDR) kit we described just a few weeks ago [link] included all three elements – a general purpose processor, a DSP, and an FPGA – in order to handle the full-spectrum of compute problems that comprise the SDR challenge.

The picture is beginning to change, however, and it isn’t clear what the new formula may be.  FPGAs have blossomed as system integration platforms.  You can now cost-effectively plop a general-purpose processor, an interesting amount of memory, and a copious quantity of DSP accelerators on a single FPGA.  Your compute platform of the past is therefore turned inside-out.  The full gamut of hardware elements you need for almost any complex problem can be found in a single, reprogrammable piece of hardware.

Unfortunately, this piece of progress only complicates the programming picture.  Nobody is even close to offering a well-oiled tool chain that can properly partition your problem into elements that can be executed on the various hardware resources available on today’s complex FPGA.  Beginning with a straightforward description of a complex algorithm, the transformations and convolutions required to get it into a balanced, working hardware implementation in an FPGA platform are almost comical.  While a complicated design may be just at the edge of what we can comprehend in its MATLAB form, it soon becomes completely incomprehensible in the face of partitioning for optimization of processing elements, translation into various representations such as C, HDL, and structural netlists, and mapping of those pieces via synthesis, compilation, and other techniques into the raw form required for realization in hardware.  The distance from the clean mathematics of your algorithm to the convoluted reality of lookup tables is enormous.

While the FPGA companies and their partners in the tool business struggle to simplify the task of DSP-on-FPGA design, a number of new contenders are rushing into the game.  Domain-specific hardware that maps more closely and directly to the DSP algorithm itself is hitting the market.  Devices such as those by MathStar and Ambric seek to simplify the process of pushing algorithms into parallel hardware while retaining the performance benefits of FPGAs.  These also still suffer from the curse of unproven design methodologies, however.  Despite their simplified requirements, their tool chains are starting years behind the FPGA-based competition, and it will take a lot of real-world design success (and failure) for them to catch up to parity.

So – will FPGAs be the new DSPs?  When the programming problem is solved and the balance of ease-of-use to power and performance shifts, will DSPs disappear altogether?  Will the new challengers come riding in and steal victory away from FPGAs just as they’re on the brink of winning?  Only the next few years will tell the tale.  Don’t worry, though.  As long as we can think of computing problems that current commodity technology can’t solve, there will be fun challenges for us to tackle – applications where creative elegance trumps traditional wisdom.  For us in the design community, that’s the fun part.

Leave a Reply

featured blogs
Dec 4, 2020
As consumers, wireless technology is often taken for granted. How difficult would everyday life be without it? Can I open my garage door today? How do I turn on my Smart TV? Where are my social networks? Most of our daily wireless connections – from Wi-Fi and Bluetooth ...
Dec 4, 2020
I hear Percepio will be introducing the latest version of their Tracealyzer and their new DevAlert IoT device monitoring and remote diagnostics solution....
Dec 4, 2020
[From the last episode: We looked at an IoT example involving fleets of semi-trailers.] We'€™re now going to look at energy and how electronics fit into the overall global energy story. Whether it'€™s about saving money on electricity at home, making data centers more eff...
Dec 4, 2020
A few weeks ago, there was a webinar about designing 3D-ICs with Innovus Implementation. Although it was not the topic of the webinar, I should point out that if your die is more custom/analog, then... [[ Click on the title to access the full blog on the Cadence Community si...

featured video

Improve SoC-Level Verification Efficiency by Up to 10X

Sponsored by Cadence Design Systems

Chip-level testbench creation, multi-IP and CPU traffic generation, performance bottleneck identification, and data and cache-coherency verification all lack automation. The effort required to complete these tasks is error prone and time consuming. Discover how the Cadence® System VIP tool suite works seamlessly with its simulation, emulation, and prototyping engines to automate chip-level verification and improve efficiency by ten times over existing manual processes.

Click here for more information about System VIP

featured paper

Keys to quick success using high-speed data converters

Sponsored by Texas Instruments

Whether you’re designing an aerospace system, test and measurement equipment or automotive lidar AFE, hardware designers using high-speed data converters face tough challenges with high-frequency inputs, outputs, clock rates and digital interface. Issues might include connecting with your field-programmable gate array, being confident that your first design pass will work or determining how to best model the system before building it. In this article, we take a look at each of these challenges.

Click here to download the whitepaper

Featured Chalk Talk

Electrification of the Vehicle

Sponsored by Mouser Electronics and KEMET

The automotive technology revolution has arrived, and with it - new demands on components for automotive applications. Electric vehicles, ADAS, connected cars, and autonomous driving put fresh demands on our electrical and electronic parts. In this episode of Chalk Talk, Amelia Dalton chats with Nick Stephen of KEMET about components for the next generation of automobiles.

More information about KEMET Electronics ALA7D & ALA8D Snap-In Capacitors