feature article
Subscribe Now

Need to accelerate the creation of technology-independent DSP hardware?

The massive increase in processing required for next generation compute-intensive applications, such as wireless communication and image processing, has created a gap between off-the-shelf DSP performance and market needs. In many cases, discrete DSPs are simply running out of steam to serve the new communications, multimedia, and consumer applications. In recent years, users have increasingly looked toward alternative solutions ranging from ultra-high performance full-custom ASICs to highly flexible general-purpose CPUs. Somewhere in the middle are FPGAs, providing a cost-effective balance (Figure 1) between programmability and high performance. With their processing flexibility ranging from serial to parallel computing, and now containing highly specialized DSP macros and memories, FPGAs have the potential to become an attractive option in which to implement DSP algorithms.

Figure 1:When it comes to DSPs, many designers are forced to compromise between programmability of discrete devices and the performance of custom FPGA or ASIC implementations.

Each platform has certain benefits and limitations. On one extreme, the pure software approach implemented in discrete DSPs is mature, flexible, and relatively easy to use but offers limited instruction-level parallelism. On the other extreme, ASIC implementations offer custom performance and high volume pricing benefits but traditionally constitute a much greater design effort and soaring NRE costs. Demonstrating some of the value from both extremes, FPGA hardware supports reprogrammability and architecture flexibility in terms of spatial and temporal parallelism (via repetition and pipelining) but lacks ease of programming since design entry is in a register-transfer level (RTL) hardware description language versus the DSP program domain of ANSI C/C++.

The catch-22 situation is that designers want the programming flexibility of the discrete DSPs and the performance flexibility available in FPGAs. How can they combine the best of both worlds? And, more importantly, what are their options if the application calls for the use of an ASIC implementation? Optimal implementation of DSP algorithms, therefore, requires a serious rethinking about how to approach the overall design flow when transforming algorithms into hardware, via either the ASIC or FPGA route. In the end, choosing the path of technology independence could mean the difference between success and failure.

Algorithmic Synthesis Bridges the Design Gap

To use RTL to create hardware implementations for complex DSP algorithms, design teams must iterate through several steps, including micro-architecture definition, hand-written RTL, and area/speed optimization through iterative RTL synthesis. This manual process is slow and introduces up to 60 percent of the bugs found in RTL due to design misinterpretation from original specification. In the final result, both the micro-architecture and technology characteristics become hard-coded into the RTL description. This effect severely limits the notion of RTL reuse or retargeting for real applications, and leads to overbuilt designs and wasted silicon.

New DSP-specific flows enable algorithmic design at a higher level of abstraction than RTL. Although high-level synthesis tools have been available for some time, none have delivered the necessary ease-of-use and quality of results until now. Now, a new breed of “algorithmic synthesis” tools offer a faster path to custom DSP hardware. The best algorithmic synthesis tools take industry standard pure ANSI C++ as input and automatically produce RTL based on user-defined design goals. This approach closes the conceptual gap between algorithm designers modeling in pure ANSI C or C++ and hardware designers working at the RTL abstraction level (Figure 2).

By using a technology-independent ANSI C++ source, these tools enable designers to choose between ASIC or FPGA implementations, and provide designers with a means to incrementally explore and optimize implementation architecture. The end result is a design architecture and RTL implementation tuned to the device and system requirements, all delivered up to 20X faster and with 60 percent fewer bugs versus hand-coded RTL.

Figure 2: Algorithmic synthesis methodologies based on pure ANSI C++ offer a faster path to custom DSP hardware, enabling high performance implementations in less time.

More importantly, the ability to select fundamentally superior platform-independent micro-architectural alternatives enables designers to create hardware designs of better quality than traditional RTL methods. Using this methodology, hardware designers can easily perform “what if” tradeoffs evaluating area, latency, throughput, and clock frequency for each micro-architecture, all the while leaving the original pure ANSI C/C++ source unchanged.

Larger, faster designs are increasingly common in the DSP realm, which implies prolonged simulation and synthesis cycles. It has become imperative to fix as many code errors as possible prior to simulation and synthesis, using the design checking capabilities in interactive HDL visualization tools. Moreover, verification takes significantly longer than design development because of the limited speed of RTL simulators and the time to manually create an RTL test bench. Advanced design verification flows, with support of industry-standard simulation tools, are now addressing rapid algorithm validation and verification by mixing the high-speed characteristics of pure ANSI C/C++ with HDL like modeling benefits found in SystemC and SystemVerilog

Choosing the Right Implementation Technology

Algorithmic synthesis must also take into consideration technology-specific characteristics of RTL synthesis to be fully effective. For example, algorithmic synthesis must be aware of high-performance operations available in some FPGAs such dedicated block multipliers, multiply/accumulate macros, pipelined operations, and special memory architectures. For ASICs, algorithmic synthesis must leverage the wide range of operator architectures available in RTL synthesis ranging from high-performance booth encoded parallel multipliers to area efficient bit-serial multipliers.

The key is knowledge-based synthesis tailored to the RTL synthesis tool. As such, algorithmic synthesis must be keenly aware of the inherent characteristics of RTL synthesis tools. Tight integration between algorithmic synthesis and RTL synthesis ensures timing closure in the back-end as well as accurate up-front area, performance and power estimates in the front-end.

Challenges and Opportunities Abound

When all is said and done, there are still limitations and challenges ahead. While FPGA devices are bigger than ever before, they nonetheless are still constrained by size. The largest algorithms admittedly will not fit onto current FPGAs. FPGA cost and power consumption are still major issues in consumer applications, where DSP applications have major impact. Technology-independent solutions such as algorithmic C synthesis provide the inherent flexibility to target critical DSP algorithms between discrete DSP, ASIC and FPGA implementations, a critical success factor since application segments dictate market cost, performance, and flexibility requirements. Using the innovative, technology-independent solutions now becoming available, the design community can stay ahead of the competitive curve and fully exploit the unprecedented opportunities ahead.

Leave a Reply

featured blogs
Jul 6, 2020
If you were in the possession of one of these bodacious beauties, what sorts of games and effects would you create using the little scamp?...
Jul 3, 2020
[From the last episode: We looked at CNNs for vision as well as other neural networks for other applications.] We'€™re going to take a quick detour into math today. For those of you that have done advanced math, this may be a review, or it might even seem to be talking down...
Jul 2, 2020
In June, we continued to upgrade several key pieces of content across the website, including more interactive product explorers on several pages and a homepage refresh. We also made a significant update to our product pages which allows logged-in users to see customer-specifi...

Featured Video

Product Update: Advances in DesignWare Die-to-Die PHY IP

Sponsored by Synopsys

Hear the latest about Synopsys' DesignWare Die-to-Die PHY IP for SerDes-based 112G USR/XSR and parallel-based HBI interfaces. The IP, available in advanced FinFET processes, addresses the power, bandwidth, and latency requirements of high-performance computing SoCs targeting hyperscale data center, AI, and networking applications.

Click here for more information about DesignWare Die-to-Die PHY IP Solutions

Featured Paper

Cryptography: A Closer Look at the Algorithms

Sponsored by Maxim Integrated

Get more details about how cryptographic algorithms are implemented and how an asymmetric key algorithm can be used to exchange a shared private key.

Click here to download the whitepaper

Featured Chalk Talk

Cadence Celsius Thermal Solver

Sponsored by Cadence Design Systems

Electrical-thermal co-simulation can dramatically improve the system design process, allowing thermal design adaptation to be done much earlier. The Cadence Celsius Thermal Solver is a complete electrical-thermal co-simulation solution for the full hierarchy of electronic systems from ICs to physical enclosures. In this episode of Chalk Talk, Amelia Dalton chats with CT Kao of Cadence Design Systems about how the Celsius Thermal Solver can help detect and mitigate thermal issues early in the design process.

More information about Celsius Thermal Solver