feature article
Subscribe Now

Need to accelerate the creation of technology-independent DSP hardware?

The massive increase in processing required for next generation compute-intensive applications, such as wireless communication and image processing, has created a gap between off-the-shelf DSP performance and market needs. In many cases, discrete DSPs are simply running out of steam to serve the new communications, multimedia, and consumer applications. In recent years, users have increasingly looked toward alternative solutions ranging from ultra-high performance full-custom ASICs to highly flexible general-purpose CPUs. Somewhere in the middle are FPGAs, providing a cost-effective balance (Figure 1) between programmability and high performance. With their processing flexibility ranging from serial to parallel computing, and now containing highly specialized DSP macros and memories, FPGAs have the potential to become an attractive option in which to implement DSP algorithms.

Figure 1:When it comes to DSPs, many designers are forced to compromise between programmability of discrete devices and the performance of custom FPGA or ASIC implementations.

Each platform has certain benefits and limitations. On one extreme, the pure software approach implemented in discrete DSPs is mature, flexible, and relatively easy to use but offers limited instruction-level parallelism. On the other extreme, ASIC implementations offer custom performance and high volume pricing benefits but traditionally constitute a much greater design effort and soaring NRE costs. Demonstrating some of the value from both extremes, FPGA hardware supports reprogrammability and architecture flexibility in terms of spatial and temporal parallelism (via repetition and pipelining) but lacks ease of programming since design entry is in a register-transfer level (RTL) hardware description language versus the DSP program domain of ANSI C/C++.

The catch-22 situation is that designers want the programming flexibility of the discrete DSPs and the performance flexibility available in FPGAs. How can they combine the best of both worlds? And, more importantly, what are their options if the application calls for the use of an ASIC implementation? Optimal implementation of DSP algorithms, therefore, requires a serious rethinking about how to approach the overall design flow when transforming algorithms into hardware, via either the ASIC or FPGA route. In the end, choosing the path of technology independence could mean the difference between success and failure.

Algorithmic Synthesis Bridges the Design Gap

To use RTL to create hardware implementations for complex DSP algorithms, design teams must iterate through several steps, including micro-architecture definition, hand-written RTL, and area/speed optimization through iterative RTL synthesis. This manual process is slow and introduces up to 60 percent of the bugs found in RTL due to design misinterpretation from original specification. In the final result, both the micro-architecture and technology characteristics become hard-coded into the RTL description. This effect severely limits the notion of RTL reuse or retargeting for real applications, and leads to overbuilt designs and wasted silicon.

New DSP-specific flows enable algorithmic design at a higher level of abstraction than RTL. Although high-level synthesis tools have been available for some time, none have delivered the necessary ease-of-use and quality of results until now. Now, a new breed of “algorithmic synthesis” tools offer a faster path to custom DSP hardware. The best algorithmic synthesis tools take industry standard pure ANSI C++ as input and automatically produce RTL based on user-defined design goals. This approach closes the conceptual gap between algorithm designers modeling in pure ANSI C or C++ and hardware designers working at the RTL abstraction level (Figure 2).

By using a technology-independent ANSI C++ source, these tools enable designers to choose between ASIC or FPGA implementations, and provide designers with a means to incrementally explore and optimize implementation architecture. The end result is a design architecture and RTL implementation tuned to the device and system requirements, all delivered up to 20X faster and with 60 percent fewer bugs versus hand-coded RTL.

Figure 2: Algorithmic synthesis methodologies based on pure ANSI C++ offer a faster path to custom DSP hardware, enabling high performance implementations in less time.

More importantly, the ability to select fundamentally superior platform-independent micro-architectural alternatives enables designers to create hardware designs of better quality than traditional RTL methods. Using this methodology, hardware designers can easily perform “what if” tradeoffs evaluating area, latency, throughput, and clock frequency for each micro-architecture, all the while leaving the original pure ANSI C/C++ source unchanged.

Larger, faster designs are increasingly common in the DSP realm, which implies prolonged simulation and synthesis cycles. It has become imperative to fix as many code errors as possible prior to simulation and synthesis, using the design checking capabilities in interactive HDL visualization tools. Moreover, verification takes significantly longer than design development because of the limited speed of RTL simulators and the time to manually create an RTL test bench. Advanced design verification flows, with support of industry-standard simulation tools, are now addressing rapid algorithm validation and verification by mixing the high-speed characteristics of pure ANSI C/C++ with HDL like modeling benefits found in SystemC and SystemVerilog

Choosing the Right Implementation Technology

Algorithmic synthesis must also take into consideration technology-specific characteristics of RTL synthesis to be fully effective. For example, algorithmic synthesis must be aware of high-performance operations available in some FPGAs such dedicated block multipliers, multiply/accumulate macros, pipelined operations, and special memory architectures. For ASICs, algorithmic synthesis must leverage the wide range of operator architectures available in RTL synthesis ranging from high-performance booth encoded parallel multipliers to area efficient bit-serial multipliers.

The key is knowledge-based synthesis tailored to the RTL synthesis tool. As such, algorithmic synthesis must be keenly aware of the inherent characteristics of RTL synthesis tools. Tight integration between algorithmic synthesis and RTL synthesis ensures timing closure in the back-end as well as accurate up-front area, performance and power estimates in the front-end.

Challenges and Opportunities Abound

When all is said and done, there are still limitations and challenges ahead. While FPGA devices are bigger than ever before, they nonetheless are still constrained by size. The largest algorithms admittedly will not fit onto current FPGAs. FPGA cost and power consumption are still major issues in consumer applications, where DSP applications have major impact. Technology-independent solutions such as algorithmic C synthesis provide the inherent flexibility to target critical DSP algorithms between discrete DSP, ASIC and FPGA implementations, a critical success factor since application segments dictate market cost, performance, and flexibility requirements. Using the innovative, technology-independent solutions now becoming available, the design community can stay ahead of the competitive curve and fully exploit the unprecedented opportunities ahead.

Leave a Reply

featured blogs
Jun 21, 2018
Doing business today isn’t quite like it was back in the 80’s. Sparkling teeth and x-ray vision shouldn’t be a side effect of a customer using your product. This, of course, is said in jest, but no longer do we sell only a product; but a product and physical...
Jun 21, 2018
Welcome back to our series on cloud verification solutions. This is part two of a three-part blog'€”you can read part one here . The high-performance computing (HPC) market continues to grow. Analysts say that the HPC market will reach almost $11 billion by 2020'€”that'€...
Jun 7, 2018
If integrating an embedded FPGA (eFPGA) into your ASIC or SoC design strikes you as odd, it shouldn'€™t. ICs have been absorbing almost every component on a circuit board for decades, starting with transistors, resistors, and capacitors '€” then progressing to gates, ALUs...
May 24, 2018
Amazon has apparently had an Echo hiccup of the sort that would give customers bad dreams. It sent a random conversation to a random contact. A couple had installed numerous Alexa-enabled devices in the home. At some point, they had a conversation '€“ as couples are wont to...
Apr 27, 2018
A sound constraint management design process helps to foster a correct-by-design approach, reduces time-to-market, and ultimately optimizes the design process'€”eliminating the undefined, error-prone methods of the past. Here are five questions to ask......