Electronic System Level (ESL) design claims to offer not just a quicker path from concept to hardware, but a cost-effective one when targeting platform FPGAs. And yet some hold out on ESL due to concern over the netlist’s quality of results (QoR)—a path from concept-to-silicon is not of much use if it does not meet performance and area requirements.
Using the same building blocks
Quality-of-results, however, not only depends on ESL synthesis itself but its interaction with downstream tools. For example, given that high-level algorithms are commonly specified in C++, the best algorithmic synthesis tools take standard ANSI C++ as input and automatically produce RTL based on user-defined design goals. This in turn must go through RTL synthesis and FPGA place-and-route before it reaches real hardware, as depicted in Figure 1. Hence, the extent of integration among these various stages can affect whether or not design goals are met.
Within the C synthesis environment itself, designers should be able to explore different potential architectural implementations and examine the tradeoffs among performance, area, latency, and throughput. These tradeoffs can only be explored effectively if the hardware building blocks that are inferred from the un-timed C++ models have accurate timing and area information.
A C++ design is essentially an algorithm that describes a series of arithmetic operations that can be mapped to hardware components. To add timing behavior to the algorithm, delays for these hardware components must reflect their downstream implementation. For example, if a multiplier of specific size is to be inferred from a C++ model, the delay information must be understood in order to determine the scheduling of the overall algorithmic operation—and ultimately determine the best implementation for the design.
It is RTL synthesis that must provide this information—e.g., delay in nanoseconds and area in units of logic-utilization (depending on the FPGA architecture), as shown in Figure 2. Whether the component is a multiplier, adder, counter, memory, or other series of logic gates representing a specific function, each has correlating physical information needed by high-level synthesis.
Hence, accuracy of component data is largely driven by the degree of integration and certification across the C synthesis and RTL synthesis environments. By providing RTL synthesis timing information to C Synthesis, and using the same RTL synthesis tool for the actual implementation, architectural tradeoffs are based on the most accurate data, allowing the high-level synthesis tool to arrive at the optimal hardware implementation.
Not only should the two point-tools be certified with each other and the flow routinely tested, but library data must be kept up to date between them. Because RTL synthesis tools are always improving their quality-of-results, the latest version may have improved library component performance and area results. While the C synthesis tool should naturally have component data within its own installation, it becomes out of date when a new version of RTL synthesis is released, installed, and used within the current flow. This means a slightly older version of C synthesis may have out-of-date component information and implementation results may be inaccurate. The two tools should have a simple mechanism to keep data synchronized, such as pointing to the same component library. This ensures that architectural tradeoffs are made with the best information.
Design Analysis Across the Flow
Yet even with the most accurate component information, and with a robust high-level synthesis point tool, extensive analysis may be required to close in on aggressive design goals—not just within each point tool but across the entire ESL flow.
Among the advantages of C synthesis is that designers can find the most suitable hardware architecture for their algorithms, without having to change the actual C-code and without blindly coding different RTL implementations by hand as a matter of trial-and-error. This reduces the number of potential RTL bugs and allows for quick adaptation to changing specifications.
This does not rule out, however, the possible need to iterate throughout the flow to close in on aggressive timing requirements, for example. Out-of-the-box QoR from C synthesis, to RTL synthesis, to FPGA place-and-route is desirable, but analysis at various stages of implementation must be available to resolve any design closure problems.
Even hand-coded RTL must go through re-iteration at times to meet difficult QoR goals. High-end RTL synthesis tools allow intuitive cross-probing from critical paths in schematics and timing reports to the offending RTL code. This is in order to allow the designer to review the written code and understand the root-cause of QoR bottlenecks—such as performance, area, throughout, or latency issues. Similar cross-probing is typically available within the front-end C synthesis environment itself (e.g., C-code, HDL code, schematics, Gantt charts). This degree of visibility should be available throughout the entire implementation flow, as shown in Figure 3. This way designers can easily trace back from post-synthesis and post-place-and-route data to source C++ code to understand what hardware is generated and why quality-of-results are being impacted.
In such scenarios either the user constraints or C++ code may need to be modified. For example, if a multiplier path is failing after place-and-route, cross-probing from the timing violation to the C synthesis environment allows for experimentation with different levels of pipelining constraints. Because pipelining is a constraint, no C++ code changes would be necessary. As another example, perhaps the number of DSP blocks generated is not as expected. Tracing to the relevant C++ source code makes it easier to understand if and where a code change may be necessary.
Designers should be able to launch RTL synthesis from the C synthesis environment itself, either interactively or in batch mode, with an automated import of design and constraints into the RTL synthesis environment. From there, they should be able to trace back-and-forth between both environments as needed. Given that 3rd party RTL synthesis tools typically encapsulate FPGA place-and-route for most FPGA vendors—this cross-probing ability extends to post-place-and-route data as well. Collectively, the implementation stages of C Synthesis, RTL synthesis, and FPGA place-and-route reporting are part of a well connected analysis environment.
Productivity through Implementation
With the ESL imperative being to improve productivity of hardware design, it is only natural that downstream tools integrate with ESL environments to augment productivity down to final implementation. Those who sit on the sidelines over concern of ESL-generated netlist quality-of-results might want to consider not only the quality of today’s ESL synthesis tools but the degree of integration now available in implementation flows. For those targeting complex FPGA platforms, the integration among algorithmic synthesis, RTL synthesis, and place-and-route reporting allows not only for more accurate results but an integrated analysis environment to assist with design closure.
October 7, 2008