Timing Closure Methodology for Advanced FPGA Designs

Today’s design application and performance requirements are more challenging due to increased complexity. When performance requirements for any part of a design are not completely satisfied, the system fails to function as desired. Whether using application specific standard products (ASSPs), application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs), timing closure poses a challenge for system design.

What Makes a Design Complex?

In the design process, several factors can make timing closure difficult to achieve. For example, in FPGA designs, resource location can be a concern. Locations of specialty blocks, such as DSP, transceivers or RAMs can pose problems as a result of congestion around these blocks. Poor resource placement can result in unmet timing requirements.

Conflicts can occur between resource, area, power, and timing requirements in a design. If the design requires more resources, the resources may have to be spread out across the target device selected for implementation. Spread-out resources have long interconnections. At smaller device geometries, delays are dominated by interconnect delays rather than cell delays. Having shorter net lengths is ideal to having a smaller area; therefore, these two requirements generally conflict.

Another common conflict is between reliability and the time available for verification. Because of the reduced market window that dictates a product’s success, the goal of system designers is to have a design working within the shortest amount of time, at the lowest possible cost, and is simple, scalable, and reliable. To maximize the window of opportunity, the design cycle must be shrunk. However, the requirement to having successful design results in having to spend more time verifying the design. Therefore, these factors make closing timing on a design a complex issue.

When designing with FPGAs, the design software includes a large set of options to suit a variety of system applications. The design software default settings offer a balance of performance, area, power, optimization, and compilation time, and generally give the best optimization trade-offs for a variety of designs. But you may have to choose settings that are different from the default settings, because each design is different. For such designs that cannot meet timing requirements with default compilation settings, following a good methodology can help achieve timing closure requirements faster by reducing the number of iterations and thereby improving productivity

Planning for Timing Closure

Proper planning can help achieve timing closure faster. Even though FPGAs offer design re-configuration and bug fixing advantages over ASSPs and ASICs, good planning and discipline will have additional benefits for achieving correct design results more quickly.

During Specification Stage

Start planning for timing closure right at the specification stage. Create a block diagram of your system with required details of how to partition the desired functionality into specific blocks (An example block diagram of a system using SDI megacore is shown in Figure 1).Try creating blocks that encapsulate distinct functionality. Keep them to a size that is convenient for debugging during functional simulation and during timing closure.

Figure 1. An Example Block Diagram of a System Using SDI Megacore

Device Selection – FPGAs are available with different design densities, speed grades, and packaging options that can accommodate different applications. Consider performance, logic and memory density, I/O density, power utilization, packaging, and cost when choosing a device.

Plan for On-Chip Debugging – The design software has on-chip debugging tools that offer different advantages and trade-offs, depending on the system, design, and user requirements. Evaluating on-chip debugging options early in the design process will ensure that the system board, the project, and design will support the appropriate options. Timing errors due to unspecified timing requirements often appear as functional failures of the design. Locating the functional block where the errors originated, makes it easier to find the source of the errors, and fix those problems.

During Coding and Compilation Stage

You can take these following steps during coding and compilation for early timing closure.

Plan for Incremental Compilation – For your designs using Altera FPGAs, the incremental compilation feature in the Quartus II design software allows partitioning of a design, compiling partitions separately, and reusing results for unchanged partitions. In addition, performance for unchanged blocks can be preserved, while reducing the number of design iterations. For designs not using incremental compilation, if the code or settings are changed for one block in the design, there will likely be different compilation results. These different compilation results can cause timing violations in blocks that do not have code or setting changes. However, with the incremental compilation methodology, the earlier results for a block that did not change will be preserved.

Incremental compilation can be used as a strategy to optimize a block that has difficulty meeting timing requirements, as shown in Figure 2. For this type of block, create a partition and turn on specific optimizations for the partition as a stand-alone design or within the design. After getting satisfactory results, that block can be preserved for future compilations by choosing the appropriate netlist type with the incremental compilation feature.

Figure 2. Summary of Incremental Compilation Flow

When incremental compilation partitions are used, details of other partitions are not visible when the design software works on one of the partitions; therefore, optimization (or logic minimization) across partitions is not possible. When your design is not yet complete, mark incomplete partitions as empty and compile the remainder of the design to produce an early timing estimate, as well as detect problems in design integration.

Early Compilation of Blocks – Identify the major functional blocks of the design. With larger designs, following a flat design methodology is not practical. Creating a hierarchical design is beneficial, because it divides the design into manageable blocks and also makes it possible for multiple designers to work on a project.

A common problem that prolongs the design cycle is waiting for code completion to compile the design. With this approach, issues are not detected until the design is complete. Do not wait until all blocks are coded to compile the entire design for the first time. Major blocks need to be compiled as soon as possible, even if the design is not complete. This allows identification of coding styles that may not be appropriate for the chosen device, or styles that might negatively affect timing performance of the design. Resource issues can also be identified early in the design cycle. At this stage, the floorplan and partition connections should also be verified.

A dummy project for low-level blocks can be created. For these internal blocks, declare all ports to be virtual pins before compilation, because the number of ports on the blocks may exceed the number of I/O pins available on the FPGA. Run timing analysis on internal portions of the design to identify any possible bottlenecks within a block. If intra-block issues are addressed early, you can concentrate on issues between blocks later in the design cycle.

Special optimization strategies can be evaluated for individual design blocks. You can also enable features such as physical synthesis to benchmark performance improvements on sub-blocks by compiling the major blocks independently. With this, the appropriate optimization settings for each block can be decided when running a top-level compilation. This reduces the impact on the overall compilation time.

When you find out problematic design issues early in the design cycle and fix them, you will have less to worry about them in the final timing closure stages.

Verification Plan – Exercise the design with a simulation test suite at the RTL level so the desired functionality is correctly coded in HDL. If there are many functional and timing problems at the same time, it can be difficult to isolate issues and make required fixes, which increases debug time. Designs that target large FPGAs are complex in nature and ad-hoc verification will not suffice. Efficient planning and execution for verification is necessary, along with planning for the design and implementation.

When partitioning the design, it is a good to verify individual partitions. Even if you are not using design partitions, verify each major block individually with stand-alone simulations to ensure that the desired requirements can be achieved on a block-by-block basis. You might be able to reuse some of the partition-level test benches and test cases in the top-level test suite too.

Best Practices for Timing Closure

Here are some generic best practices that are applicable for many situations, which you can follow to close timing on your design faster.

Following Synchronous Design Practices – helps with constraining the design, and remove any tool dependency. Although asynchronous techniques might save time in the short term and seem easy to implement, asynchronous design techniques rely on propagation delays and clock skews that do not scale well between different device families or architectures. Asynchronous circuits are prone to problems, such as glitches and race conditions, which render the resulting implementation unreliable, making it is difficult to properly constrain a design. In the absence of appropriate constraints, synthesis or place-and-route tools may not perform the best optimizations, resulting in inaccurate timing analysis results, and an implementation that does not meet your requirements. These factors outweigh the advantages of quick design fixes with asynchronous techniques.

Following Good Coding Styles – can have a significant impact on how the design is implemented in an FPGA, because the synthesis tool being used can optimize and interpret the design differently than intended. Therefore, decide how to modify the design to assist the optimizations done by the synthesis tool. FPGA devices are register-rich, so pipelining a design can help meet required performance, while not adversely affecting resource use. Adding adequate pipeline registers can help avoid a large amount of combinational logic between registers.

Creating a Good Design Hierarchy – reflects the functional interfaces in the design so that functionally different blocks might be designed by different designers. Creating a good design hierarchy makes team-based designs efficient by allowing individuals or groups to separately design, verify, and implement different functional blocks. For example, in the design shown in Figure 3, there is a two-level hierarchy under the top level module A. The hierarchy is divided into three different design partitions; each of the partitions may be designed and optimized by different designers, if the interfaces are well defined.

Figure 3. Partitions in a Hierarchical Design

The interfaces between the blocks should be properly constrained. If timing failures are within specific blocks in the hierarchy, set the constraints so that the design software applies appropriate optimizations to meet those constraints. After meeting the requirements for the specific block, lock down the block and work on the remaining blocks in the hierarchy.

When creating a hierarchy, consider reusing blocks by making parameterized modules. Try to reduce the number of inter-block connections. Register signals that pass across those module boundaries which you plan to compile as incremental compilation partitions.

Use Partitions and Incremental Compilation – to preserve compilation results for your FPGA designs. When making changes to a design or to the optimization settings, the design software detects the changes made. These changes force a recompile of the design, even if the changes were limited to a few modules. Changes in the design or settings cause a different set of initial conditions for the Fitter. In the initial placement the Fitter finds that the design might be entirely different than previous placements, which may result in many differences between compilations.

Preserving previous compilation results and reducing compilation times with multiple compilations can be achieved by using partitions and incremental compilation.

Appropriately Constrain your Design – to reflect the design needs. The design software can optimize based only on the constraints you specify. However, avoid over-constraining.

Creating location assignments for placing logic and I/O blocks at specific locations in the chip can be done with the design software. The software logic placement is generally better than user-assigned placement. For example, it may not help to assign a block containing a critical path to a LogicLock region and constrain the path by squeezing the LogicLock region. The Fitter is aware of the critical path and attempts to optimize the path by considering many physical constraints to find the best placement. By restricting placement, performance may deteriorate slightly, so use this strategy judiciously. There are times when using location constraints effectively will aid in timing closure, directly or indirectly such as when creating a floorplan for the design

In a team-based design, each major block in the design could be designed by a different engineer. In such cases, pre-assign each block to have its assigned area in the device. Areas in the device based on interfaces and resources such as transceivers or RAM blocks can be reserved. Reserving the region on a previous Fitter-assigned area can also be done. However, having excessive location constraints may negatively affect performance.

Appropriately constrain the design for timing, specifying your requirements taking care of the relationships between different clock domains, and identifying the false paths and multi-cycle paths in the design to get an accurate timing analysis report. By default, the TimeQuest Timing Analyzer assumes that clocks in a design are related to each other and analyzes all paths. Therefore, make sure to indicate the paths that you do not care about and require no analysis. Supplying the appropriate constraints helps separate real violations from false violations, and to make changes in the HDL or assignments to solve issues when real violations are identified.

When there are multiple interacting clock domains or when the design has high performance circuits, such as external memory interfaces or clock multiplexing, ensure that there are correct inter-clock constraints so that the design software can work on the most critical paths.

Use of Optimal Settings – in addition to using the correct design methodology and proper timing constraints ensures appropriate optimizations are applied to your design. The design software offers many options to meet different design requirements. Using an inappropriate setting, may cause adverse design performance.

The design software default settings offer a balance of performance, area, power, optimization, and compilation time. To achieve a specific goal, such as performance, low power, area, or compilation time, choose a setting different from the default setting. By trading off one feature for another, the preferred design requirements can be met. Only use settings that help meet your goals. Keep in mind when you make substantial changes to a design, you may have to modify some optimizations too.

Conclusion

Timing closure is a critical phase of the design cycle that determines the success or failure of a product. To attain the best results plan for timing closure rather than trying to meet the timing requirements with an ad-hoc approach. By following the guidelines in this article, the process of planning for timing closure can be done efficiently.

About the Author:

Ramaprasad Kowshika, Member Technical Staff, Application Engineering

As a member of Altera’s Technical Staff, Ramaprasad Kowshika helps customers solve optimization and timing closure related issues in their designs using Altera’s software tools. With this insight, he works with other engineering departments within the company to continuously improve design tools to meet customer needs. Ramaprasad has a M.B.A from San Jose State University and M.S. in Electrical Engineering from Anna University. He joined Altera in July 2006, and has previously held engineering positions at IBM, Paxonet Communications and Exar Corporation.