Physical Synthesis Flows for FPGA Designs

Introduction:

Most FPGA designs today rely on an HDL based description of their design. HDL synthesis is probably the single most important software flow step when it comes to defining the performance of a design. Synthesis links the conceptual description of the logic functions needed for the design to their actual physical architecture elements in the underlying device. This step cannot be underestimated. Synthesis is performed prior to chip placement as an entirely separate step, hence these technology dependent optimizations are computed without knowledge of actual chip placement. As a result, design performance can be far from optimal, impacted by choices made too early. This is where physical synthesis comes into play, bringing physical information to the synthesis engine.

Traditional Flow versus Physical Synthesis Flow:

The most common design flows use synthesis and place & route as two consecutive disjointed steps. Synthesis generates an EDIF netlist that is then passed on to the backend for implementation. The netlist contains basic elements such as LUTs, flip-flops, etc., but does not control how these elements will be packaged together in the FPGA clusters (referred to as “slices” in Xilinx® FPGAs) during the packing phase. Synthesis also has no control on placement and often does not have access to the entire design, if cores are used as black boxes.

With physical synthesis, it’s different. Physical synthesis yields a better result because it provides information about the actual critical paths, the ones that placement is actually seeing. This is a key feature as it closes the loop between synthesis and place & route.

Figure 1 compares the two flows. The traditional flow is shown on the left and the physical synthesis flow using Xilinx® ISE™ 9.1i is shown on the right. All options in blue are explained in detail in the next section.

Fig 1: Traditional flow and ISE 9.1i physical synthesis flow

Another key advantage of physical synthesis is that it guarantees a better level of consistency for both the synthesis and implementation constraints. By having an integrated environment for synthesis, packing and placement, it guarantees that synthesis and place & route are working on the same problem.

An important silicon architecture consideration: The trend in modern FPGA silicon architecture is to offer more and more capable clusters (or slices). This permits more possibilities for physical synthesis flows since the traditional ISE software flow places already pre-packed slices. In effect, the traditional flow does not place LUTs and flip-flops, it actually places slices. The –timing option in ISE software enables placement at the most basic element level (non only LUTs and flip-flops but also logic fragments in the slice like dedicated arithmetic and multiplexer circuitry).

To respond to the challenge of physically aware synthesis, several approaches exist today with tools that enable synthesis optimizations aware of placement and capable of modifying technology mapping, altering clustering (packing) and enhancing placement, using information from the initial placement.

The following paragraphs provide an overview of solutions provided by Xilinx ISE 9.1i software and Synplicity® Synplify® Premier.

Physical Synthesis Optimizations in ISE Software:

ISE 9.1i software provides several physical synthesis options to improve results beyond the default compiles. These optimizations are applied on the same base netlist used in the traditional, non-physical flow. This enables ISE software to use any incoming netlist without having to rely on a particular synthesis tool. Users can also use Xilinx synthesis tool (XST) as a design entry tool for this flow.

All options are part of the MAP step of the ISE implementation flow. To enable the flow, the following options are used:

MAP command line option	Description
-global_opt on\|off	Optimization routines that operate on the fully assembled netlist after initial packing. These optimizations include logic remapping and trimming, logic and register replication. This option can optimize black-boxed portions of the design.
-logic_opt on\|off	Post-placement logic restructuring. Operates on a placed netlist to optimize timing critical connections through restructuring and re-synthesis, followed by incremental placement and incremental timing analysis. Option is enabled in conjunction with “–timing.”
-register_duplication on\|off	The option is only available when running timing-driven packing and placement with the –timing option. The option duplicates registers to improve timing when running timing-driven packing (“–timing”).
-retiming on\|off	When this option is on, registers are moved through the logic to balance out the delays in a timing path to increase the overall clock frequency. By default, this option is off. It requires global_opt “on” to operate.
-timing (Note: this option is always active for Virtex-5 FPGAs)	Enables packing and placement interaction based on timing goals. When activated, placement is done during the MAP phase, therefore the –ol option should be used along with it.

Table 1: MAP Physical Synthesis properties description

Alternatively, all these options (shown in red in the picture on the right) are accessible from Project Navigator, the main ISE GUI via the “Process Properties” window. Property display level must be set to “Advanced.”

Figure 2: MAP properties for Physical Synthesis

The effectiveness of the options discussed above depends on a number of factors. These MAP options will have more opportunities to make improvements in the following situations:

Under-constraining in synthesis prevents it from generating the best optimizations. To avoid this situation, it is recommended to tightly constrain synthesis until the tool reports negative slack.
Inconsistent constraining between synthesis and implementation is a fairly common situation in which synthesis is not driven to optimize paths that are later constrained during implementation. Physical synthesis can likely re-build the fast logic needed to meet timing. To remedy this situation in the traditional flow, carefully examine constraints between synthesis and implementation and make sure similar paths are covered in both.
In a bottom-up or partition flow, synthesis may not optimize between blocks or partitions.
Design reuse netlist used as “black-boxes” in synthesis may limit the amount of possible optimization. Note that synthesis has the capability in the traditional flow to “read” netlists from black-boxes. This helps the tools analyze paths going to and coming from the black boxes. But sometimes these black boxes are not added to the synthesis project and this is where physical synthesis options can have a great impact.
Designs with high LUT to flip-flop ratio (few registers for a lot of logic) are more likely to benefit from the retiming option. Note that retiming (called register balancing in XST) is also available in synthesis and can be used as part of the traditional flow.

Even if care is taken during the synthesis step and constraints are consistent between synthesis and implementation, physical synthesis can improve performance. Following are some of the optimizations used in the algorithms:

Logic Duplication: If a LUT or flip-flop drives multiple loads, and the placement of one or more of those loads is too far away from the source to meet timing requirements, the LUT or flip-flop can be replicated and placed close to that group of loads, thus reducing routing delays.
Logic Recombination: If the critical path traverses through multiple LUTs and through multiple slices, the logic can be reassembled utilizing fewer slices by using a more timing efficient combination of LUTs and MUXes to reduce the routing resources needed for that path.
Basic Element Switching: If a function is built with LUTs and MUXes within a slice, physical synthesis and optimization can rearrange the function to give the fastest path (usually through the MUX select pin) to the most critical signal as shown in Figure 3:

“sig” is timing critical, it crosses a LUT and a MUX…	“sig” has been repositioned and does not pass through a LUT anymore…
Figure 3: Basic Element Switching Example

Pin Swapping: Each input pin of a LUT may have a different delay. MAP has the ability to swap pins (and change the LUT equation accordingly) so that the most critical signal is assigned to the fastest pin. This is particularly effective with the Xilinx® Virtex™-5 FPGAs since its 6-input LUTs have distributed delays, with pins 1 through 6 being increasingly faster (pin 6 being the fastest). This pin swapping capability in MAP helps predict timing with more accuracy. It should be noted that during routing, pins can also be swapped. In the traditional flow only the routing phase will operate pin swapping.

In conclusion, Xilinx ISE 9.1i software provides several options to enable physical optimizations in a one pass flow. Choosing the right one (or the right ones) can prove to be difficult. To make it easier, Xilinx provides the Xplorer utility to run the design with these optimizations and to select the best one. The Xplorer utility is available at the command line and also from the GUI with Project Navigator.

Physical Synthesis with Synplify Premier:

Synplicity offers a physical synthesis tool known as Synplify Premier. The Synplify Premier product is a graph-based physical synthesis tool that enables single-pass physical synthesis. The essence of the graph-based approach is that pre-existing wires, switches and placement sites used for routing an FPGA are represented as a detailed routing resource graph. The notion of what is a “good routing choice” then changes from delay estimation only to a measure of actual delay and availability of interconnect wires.

Synplify Premier merges optimization, packing, placement and routing to ensure available, fast routes along critical paths and generates a fully placed and physically optimized netlist as output ready for final routing in ISE software. The main benefits of this approach are the output from synthesis is routable and timing is known after synthesis because it correlates with the timing that the user will see after ISE routes the design. This approach reduces the number of synthesis runs (ISE backend iterations) involved in meeting timing goals.

Synplify Premier provides an encapsulated flow which enables the completion of a physical synthesis design without leaving the Synplify Premier graphical interface. After entering all the design files including black boxes and carefully setting up the constraints, Synplify Premier performs the steps necessary to deliver a physically optimized design:

The tool performs an initial synthesis (or compile) and runs the ISE software flow through placement to initialize its optimizations. See figure 4 below.
Synplify Premier will then read back the results to evaluate critical paths with much better accuracy compared to the traditional synthesis flow.
Based on this first placement, Synplify Premier keeps the I/O placement and performs a global full-chip placement.
Synplify Premier also performs detailed placement taking into account very specific routing characteristics and resources of the target FPGA. As explained earlier, Synplify Premier integrates the fact that proximity alone in placement does not always lead to optimal performance because routing timing delays are not always dependant on distance alone. To account for the timing differences for the various routing structures, Synplify Premier uses the graph of pre-existing wire availability when doing placement.
At the end of the process, Synplify Premier generates a netlist, a legal, routable placement plus a constraint file (.ncf) and then spawns the Xilinx Xflow command to finalize the design routing. Xflow will check the packing, placement and will route the circuit based on the forwarded constraint file.

Figure 4: Synplify Premier Flow

Conclusion:

Physical synthesis enables better results by bridging synthesis and place & route. Xilinx provides the technology as part of ISE 9.1i using re-synthesis algorithms that can be applied to any incoming netlists.

Synplify Premier from Synplicity provides a different implementation of this technology using its own full chip placement. An initial placement will considerably improve timing predictions due to highly accurate correlation between what Synplify Premier uses and the final post-route timing results. It ultimately provides a routing-aware placement to the ISE software that meets timing after ISE software routes the design.

Frédéric Rivoallon is manager of Systems Methodology in the Design Software Division at Xilinx, he oversees the development of software methodologies, benchmarking, and design optimization techniques for new FPGA architectures.

Rivoallon joined Xilinx in 1996. Prior to joining Xilinx, he held FPGA and ASIC design engineering positions with Thomson Multimedia.

Rivoallon holds a masters degree in electrical engineering from the Institut des Science Appliquées, Reinnes, France.