feature article
Subscribe Now

Physical Synthesis Flows for FPGA Designs


Most FPGA designs today rely on an HDL based description of their design. HDL synthesis is probably the single most important software flow step when it comes to defining the performance of a design.  Synthesis links the conceptual description of the logic functions needed for the design to their actual physical architecture elements in the underlying device.  This step cannot be underestimated.  Synthesis is performed prior to chip placement as an entirely separate step, hence these technology dependent optimizations are computed without knowledge of actual chip placement.  As a result, design performance can be far from optimal, impacted by choices made too early.  This is where physical synthesis comes into play, bringing physical information to the synthesis engine.

Traditional Flow versus Physical Synthesis Flow:

The most common design flows use synthesis and place & route as two consecutive disjointed steps.  Synthesis generates an EDIF netlist that is then passed on to the backend for implementation.  The netlist contains basic elements such as LUTs, flip-flops, etc., but does not control how these elements will be packaged together in the FPGA clusters (referred to as “slices” in Xilinx® FPGAs) during the packing phase.  Synthesis also has no control on placement and often does not have access to the entire design, if cores are used as black boxes.

With physical synthesis, it’s different. Physical synthesis yields a better result because it provides information about the actual critical paths, the ones that placement is actually seeing. This is a key feature as it closes the loop between synthesis and place & route.

Figure 1 compares the two flows. The traditional flow is shown on the left and the physical synthesis flow using Xilinx® ISE™ 9.1i is shown on the right.  All options in blue are explained in detail in the next section.


Fig 1: Traditional flow and ISE 9.1i physical synthesis flow

Another key advantage of physical synthesis is that it guarantees a better level of consistency for both the synthesis and implementation constraints. By having an integrated environment for synthesis, packing and placement, it guarantees that synthesis and place & route are working on the same problem.

An important silicon architecture consideration: The trend in modern FPGA silicon architecture is to offer more and more capable clusters (or slices).  This permits more possibilities for physical synthesis flows since the traditional ISE software flow places already pre-packed slices.  In effect, the traditional flow does not place LUTs and flip-flops, it actually places slices. The –timing option in ISE software enables placement at the most basic element level (non only LUTs and flip-flops but also logic fragments in the slice like dedicated arithmetic and multiplexer circuitry).

To respond to the challenge of physically aware synthesis, several approaches exist today with tools that enable synthesis optimizations aware of placement and capable of modifying technology mapping, altering clustering (packing) and enhancing placement, using information from the initial placement.

The following paragraphs provide an overview of solutions provided by Xilinx ISE 9.1i software and Synplicity® Synplify® Premier.

Physical Synthesis Optimizations in ISE Software:

ISE 9.1i software provides several physical synthesis options to improve results beyond the default compiles. These optimizations are applied on the same base netlist used in the traditional, non-physical flow.  This enables ISE software to use any incoming netlist without having to rely on a particular synthesis tool. Users can also use Xilinx synthesis tool (XST) as a design entry tool for this flow.

All options are part of the MAP step of the ISE implementation flow.  To enable the flow, the following options are used:

MAP command line option Description
-global_opt on|off Optimization routines that operate on the fully assembled netlist after initial packing. These optimizations include logic remapping and trimming, logic and register replication.  This option can optimize black-boxed portions of the design.
-logic_opt on|off Post-placement logic restructuring.
Operates on a placed netlist to optimize timing critical connections through restructuring and re-synthesis, followed by incremental placement and incremental timing analysis. Option is enabled in conjunction with “–timing.”
-register_duplication on|off The option is only available when running timing-driven packing and placement with the –timing option. The option duplicates registers to improve timing when running timing-driven packing (“–timing”).
-retiming on|off When this option is on, registers are moved through the logic to balance out the delays in a timing path to increase the overall clock frequency. By default, this option is off.  It requires global_opt “on” to operate.
(Note: this option is always active for Virtex-5 FPGAs)
Enables packing and placement interaction based on timing goals. When activated, placement is done during the MAP phase, therefore the –ol option should be used along with it.


Table 1: MAP Physical Synthesis properties description

Alternatively, all these options (shown in red in the picture on the right) are accessible from Project Navigator, the main ISE GUI via the “Process Properties” window. Property display level must be set to “Advanced.”

Figure  2: MAP properties for Physical Synthesis

The effectiveness of the options discussed above depends on a number of factors. These MAP options will have more opportunities to make improvements in the following situations:

  • Under-constraining in synthesis prevents it from generating the best optimizations. To avoid this situation, it is recommended to tightly constrain synthesis until the tool reports negative slack.
  • Inconsistent constraining between synthesis and implementation is a fairly common situation in which synthesis is not driven to optimize paths that are later constrained during implementation.  Physical synthesis can likely re-build the fast logic needed to meet timing. To remedy this situation in the traditional flow, carefully examine constraints between synthesis and implementation and make sure similar paths are covered in both.
  • In a bottom-up or partition flow, synthesis may not optimize between blocks or partitions.
  • Design reuse netlist used as “black-boxes” in synthesis may limit the amount of possible optimization.  Note that synthesis has the capability in the traditional flow to “read” netlists from black-boxes.  This helps the tools analyze paths going to and coming from the black boxes. But sometimes these black boxes are not added to the synthesis project and this is where physical synthesis options can have a great impact.
  • Designs with high LUT to flip-flop ratio (few registers for a lot of logic) are more likely to benefit from the retiming option.  Note that retiming (called register balancing in XST) is also available in synthesis and can be used as part of the traditional flow.

Even if care is taken during the synthesis step and constraints are consistent between synthesis and implementation, physical synthesis can improve performance.  Following are some of the optimizations used in the algorithms:

  • Logic Duplication: If a LUT or flip-flop drives multiple loads, and the placement of one or more of those loads is too far away from the source to meet timing requirements, the LUT or flip-flop can be replicated and placed close to that group of loads, thus reducing routing delays.
  • Logic Recombination: If the critical path traverses through multiple LUTs and through multiple slices, the logic can be reassembled utilizing fewer slices by using a more timing efficient combination of LUTs and MUXes to reduce the routing resources needed for that path.
  • Basic Element Switching: If a function is built with LUTs and MUXes within a slice, physical synthesis and optimization can rearrange the function to give the fastest path (usually through the MUX select pin) to the most critical signal as shown in Figure 3:
“sig” is timing critical, it crosses a LUT and a MUX…
“sig” has been repositioned and does not pass through a LUT anymore…

Figure  3: Basic Element Switching Example

  • Pin Swapping: Each input pin of a LUT may have a different delay.  MAP has the ability to swap pins (and change the LUT equation accordingly) so that the most critical signal is assigned to the fastest pin.  This is particularly effective with the Xilinx® Virtex™-5 FPGAs since its 6-input LUTs have distributed delays, with pins 1 through 6 being increasingly faster (pin 6 being the fastest). This pin swapping capability in MAP helps predict timing with more accuracy. It should be noted that during routing, pins can also be swapped.  In the traditional flow only the routing phase will operate pin swapping.

In conclusion, Xilinx ISE 9.1i software provides several options to enable physical optimizations in a one pass flow. Choosing the right one (or the right ones) can prove to be difficult.  To make it easier, Xilinx provides the Xplorer utility to run the design with these optimizations and to select the best one. The Xplorer utility is available at the command line and also from the GUI with Project Navigator.

Physical Synthesis with Synplify Premier:

Synplicity offers a physical synthesis tool known as Synplify Premier. The Synplify Premier product is a graph-based physical synthesis tool that enables single-pass physical synthesis. The essence of the graph-based approach is that pre-existing wires, switches and placement sites used for routing an FPGA are represented as a detailed routing resource graph. The notion of what is a “good routing choice” then changes from delay estimation only to a measure of actual delay and availability of interconnect wires. 

Synplify Premier merges optimization, packing, placement and routing to ensure available, fast routes along critical paths and generates a fully placed and physically optimized netlist as output ready for final routing in ISE software. The main benefits of this approach are the output from synthesis is routable and timing is known after synthesis because it correlates with the timing that the user will see after ISE routes the design. This approach reduces the number of synthesis runs (ISE backend iterations) involved in meeting timing goals.

Synplify Premier provides an encapsulated flow which enables the completion of a physical synthesis design without leaving the Synplify Premier graphical interface. After entering all the design files including black boxes and carefully setting up the constraints, Synplify Premier performs the steps necessary to deliver a physically optimized design:

  • The tool performs an initial synthesis (or compile) and runs the ISE software flow through placement to initialize its optimizations. See figure 4 below.
  • Synplify Premier will then read back the results to evaluate critical paths with much better accuracy compared to the traditional synthesis flow.
  • Based on this first placement, Synplify Premier keeps the I/O placement and performs a global full-chip placement.
  • Synplify Premier also performs detailed placement taking into account very specific routing characteristics and resources of the target FPGA. As explained earlier, Synplify Premier integrates the fact that proximity alone in placement does not always lead to optimal performance because routing timing delays are not always dependant on distance alone. To account for the timing differences for the various routing structures, Synplify Premier uses the graph of pre-existing wire availability when doing placement.
  • At the end of the process, Synplify Premier generates a netlist, a legal, routable placement plus a constraint file (.ncf) and then spawns the Xilinx Xflow command to finalize the design routing.  Xflow will check the packing, placement and will route the circuit based on the forwarded constraint file.

Figure 4: Synplify Premier Flow


Physical synthesis enables better results by bridging synthesis and place & route.  Xilinx provides the technology as part of ISE 9.1i using re-synthesis algorithms that can be applied to any incoming netlists.

Synplify Premier from Synplicity provides a different implementation of this technology using its own full chip placement. An initial placement will considerably improve timing predictions due to highly accurate correlation between what Synplify Premier uses and the final post-route timing results.  It ultimately provides a routing-aware placement to the ISE software that meets timing after ISE software routes the design.  

Frédéric Rivoallon is manager of Systems Methodology in the Design Software Division at Xilinx, he oversees the development of software methodologies, benchmarking, and design optimization techniques for new FPGA architectures.

Rivoallon joined Xilinx in 1996. Prior to joining Xilinx, he held FPGA and ASIC design engineering positions with Thomson Multimedia.

Rivoallon holds a masters degree in electrical engineering from the Institut des Science Appliquées, Reinnes, France.

9 thoughts on “Physical Synthesis Flows for FPGA Designs”

  1. Pingback: DMPK Studies
  2. Pingback: free slots
  3. Pingback: next

Leave a Reply

featured blogs
Nov 24, 2020
In our last Knowledge Booster Blog , we introduced you to some tips and tricks for the optimal use of the Virtuoso ADE Product Suite . W e are now happy to present you with some further news from our... [[ Click on the title to access the full blog on the Cadence Community s...
Nov 23, 2020
It'€™s been a long time since I performed Karnaugh map minimizations by hand. As a result, on my first pass, I missed a couple of obvious optimizations....
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...
Nov 20, 2020
[From the last episode: We looked at neuromorphic machine learning, which is intended to act more like the brain does.] Our last topic to cover on learning (ML) is about training. We talked about supervised learning, which means we'€™re training a model based on a bunch of ...

featured video

Improve SoC-Level Verification Efficiency by Up to 10X

Sponsored by Cadence Design Systems

Chip-level testbench creation, multi-IP and CPU traffic generation, performance bottleneck identification, and data and cache-coherency verification all lack automation. The effort required to complete these tasks is error prone and time consuming. Discover how the Cadence® System VIP tool suite works seamlessly with its simulation, emulation, and prototyping engines to automate chip-level verification and improve efficiency by ten times over existing manual processes.

Click here for more information about System VIP

featured paper

Streamlining functional safety certification in automotive and industrial

Sponsored by Texas Instruments

Functional safety design takes rigor, documentation and time to get it right. Whether you’re designing for the factory floor or cars on the highway, this white paper explains how TI is making it easier for you to find and use its integrated circuits (ICs) in your functional safety designs.

Click here to download the whitepaper

Featured Chalk Talk

Innovative Hybrid Crowbar Protection for AC Power Lines

Sponsored by Mouser Electronics and Littelfuse

Providing robust AC line protection is a tough engineering challenge. Lightning and other unexpected events can wreak havoc with even the best-engineered power supplies. In this episode of Chalk Talk, Amelia Dalton chats with Pete Pytlik of Littelfuse about innovative SIDACtor semiconductor hybrid crowbar protection for AC power lines, that combine the best of TVS and MOV technologies to deliver superior low clamping voltage for power lines.

More information about Littelfuse SIDACtor + MOV AC Line Protection