As FPGAs have earned greater acceptance as the platform of choice for high-performance digital signal processing (DSP), the design methodology gap between software DSP implementation in DSP processors and hardware DSP implementation in FPGA or ASIC technology has grown increasingly apparent. FPGAs (particularly those with hardware DSP features) offer compelling advantages in cost, performance, and power consumption for super-power number crunching. Getting your ideas into one, however, is still many times more difficult than programming a software processor to do the same thing.
Implementing your favorite algorithm in a DSP processor usually requires a few lines of C code, a compiler, and a few days of a competent software specialist’s time. Creating a hardware implementation, from the perspective of the DSP designer at least, is a mysterious black art in the domain of the double Es down the hall. It requires weeks to months of time, a knowledge of hardware architectures and hardware description languages (HDLs), and a confusing suite of design tools, and it often involves a great deal of lost sleep.
Luckily, the design tool community has noticed this problem and continues to work furiously to upgrade and enhance the software and design processes that enable this high-value implementation path. As with any relatively new design science, there is no established “best” way to get from DSP algorithms into hardware. There are a diverse set of approaches, each with its own strengths and weaknesses, and a general trend toward higher productivity, easier learning curves, and more efficient results.
Today’s design approaches generally emanate from either an IP-based source or from an algorithmic source. A couple of the more sophisticated solutions leverage benefits of both. The number and variety of companies and products vying for supremacy in this hotly-contested and potentially lucrative segment grows almost monthly. Just about everyone involved in programmable logic has acknowledged that, along with embedded processing applications, DSP represents one of the largest growth potential areas for FPGA over the next several years.
On the IP-based side, most systems start with the MathWorks Simulink product. Simulink allows graphical, schematic-like capture of a design based on IP building-blocks such as filters. Simulink also provides an environment for simulation and “what-if” analysis as well as evaluation of conversion from floating-point to fixed-point representations of your design. FPGA vendors Xilinx, Altera, and Lattice, as well as EDA vendor Synplicity, all provide tools and IP blocks that create an IP-based design flow with Simulink. Actel has joined the fray as well by recently introducing their first specialized DSP core for their FPGA families. Other vendors like AccelChip and Celoxica also offer IP-based design methodologies, but mixed with algorithmic synthesis, as we will discuss later.
The key to these IP-based flows is starting with the library that matches your chosen design methodology and target technology. Each vendor provides a library of common DSP functions such as finite and infinite impulse response (FIR and IIR) filters, fast fourier transforms (FFT), frequency shifters/mixers, and higher-level math functions such as multiply and multiply-accumulate. Most of these blocks are configurable with variables like bit-width that allow you to customize the block for your particular application. Behind each block is a generator that creates synthesizable RTL code targeting your FPGA technology. If you design with the wrong library, you won’t have a path to implementation, so it pays to choose wisely. A vendor-neutral solution like Synplicity’s DSP Synthesis offers some advantage here, because you can target various technologies downstream without having to re-do your design at the front in Simulink.
The downside of IP-based flows is that you can’t always easily or efficiently build exactly what you want from standardized blocks. Many algorithms are new, in flux, or have subtleties that make them ill-suited for a block-based design methodology. This is where algorithmic-based design shines. The basis of the algorithmic approach is a software-like algorithm written in some high-level language like C, C++ or the MathWorks “M” language used in Matlab. Matlab is, in fact, the de-facto standard tool for developing and evaluating DSP algorithms. Algorithmic synthesis tools take an algorithm and attempt to create a hardware architecture that will implement that algorithm according to the designer’s goals. These goals can vary from maximum parallelism/performance to minimum logic utilization or maximum power efficiency.
If you design algorithmically using Matlab, C or C++, or another high-level language, your algorithm (and your thinking) are usually sequential in nature. In the sequential universe, constructs like loops are well-behaved and the sequence of execution of individual instructions is clear and predictable. Readying that algorithm for hardware implementation, however, involves breaking down that sequence into just the essential dependencies so that the maximally parallel version of the algorithm is known. Usually, however, the hardware resources available don’t permit a fully parallel implementation, nor would one be desirable given the cost/performance tradeoff. Some hardware resources, particularly expensive ones like multipliers, must be shared between various steps of the algorithm. This generally leads to the need for some control circuitry and storage elements to hold intermediate values. The science of behavioral synthesis strives to solve this problem with some degree of optimality given a particular set of constraints such as hardware resource availability, latency, and throughput.
Behavioral synthesis is not a well-solved problem, though, and the assumptions made in attempting solutions have most often led to unacceptable compromises when the technology has moved from academia to application. There are several commercial tools, however, that successfully apply behavioral synthesis techniques to solve key portions of the DSP problem. The extreme end of this approach is employed by products like Celoxica’s Agility and DK tools, Forte’s Cynthesizer, and Mentor’s Catapult C. Even within this group there are major differences of approach. Mentor’s Catapult C methodology, for example, is based around preserving the C-based algorithm in its original, architecture-agnostic state and adding design constraints and guidelines externally through a GUI and control files during an “architectural exploration” process. Celoxica favors an approach where architectural decisions and directives are embedded in the structure of the C source code.
Between the purely behavioral approaches and the IP-based flows are hybrid systems that work to take advantage of the strengths of both sides of the coin. There is obviously little sense in re-designing the 1000th version of a Viterbi or FIR filter, so it makes sense to use pre-configured and pre-optimized IP for sections of your design that map well. It also pays to synthesize and optimize parts of the design that don’t map cleanly into pre-configured library elements and to be able to optimize the resulting design, even across those IP block boundaries. This is where solutions like those from Synplicity and AccelChip shine.
Synplicity’s DSP synthesis starts with a Simulink design created from their own library elements, and uses their own RTL synthesis technology (crossing over somewhat into the behavioral synthesis space with techniques such as register retiming) to optimize the resulting design for the target technology. Although they are relatively new to the DSP synthesis space (having launched their product within the past year) Synplicity is able to leverage their considerable experience and expertise in FPGA RTL synthesis to put a powerful back-end on their DSP flow. Synplicity has seen rapid acceptance of their new product based on their close integration with Simulink, their vendor-neutral IP library, and their back-end optimization.
Moving quickly from startup to the role of veteran in the DSP synthesis business is AccelChip. AccelChip is unique in their approach of using native Matlab as a front-end and leveraging behavioral synthesis technology combined with a library of optimized, configurable IP blocks. AccelChip has been doing DSP long enough now to have moved past the basics with a full and rich offering of capabilities crafted especially for DSP designers targeting FPGA and ASIC technologies. Beginning with the first, critical steps, AccelChip provides facilities for moving from floating- to fixed-point representations of the design and verifying the results, configuring IP to custom specifications, and synthesizing the algorithmic representation down to hardware-ready RTL.
Newcomer Catalytic has focused specifically on the problem of converting wisely from floating-point to fixed-point data types. When DSP algorithms are modeled in high-level languages, the original version usually leverages full software-like data types. When moving into hardware, however, every bit counts, and fixed-point math is the norm. Deciding on fixed-point widths is a critical step in trading off performance and cost against fidelity of the hardware implementation. Catalytic’s solution is aimed at solving that problem by helping the designer make those conversions and evaluate the results.
When looking at the various solutions available, it also pays to carefully analyze the verification aspect of the design flow. Behavioral- or algorithmic-based design, for example, is problematic for simulation and verification. The behavioral level design simulates hundreds or even thousands of times faster than the final RTL version, but since a purely behavioral model is generally not clock-cycle accurate, it can’t share a testbench with the RTL version without modification. Also, fixed-point versions of a design will not, of course, give the same numerical answers as the floating-point counterpart.
Celoxica’s solution features hardware-in-the-loop (HIL) verification based on their own FPGA development board. This high-performance verification methodology also provides a rapid iteration environment for trading off parts of the whole system design between hardware and software. If a DSP algorithm is only part of your embedded system design, for example, you can experiment with moving the DSP algorithm from software to hardware to get a real understanding of the performance/area tradeoff in an actual FPGA.
With the pot of gold waiting at the end of the DSP-on-FPGA rainbow, you can expect rapid progress and innovation in these tools in the near future. Since many of the mainstream FPGA tools are somewhat commodities now, a segment such as DSP that allows innovative teams to earn a return on their development investment by eliminating a difficult design bottleneck is sure to attract a lot of attention.