Catching Mr. X: Diagnosing CDC Errors in FPGAs

One of the more popular board games of the 1980’s was Scotland Yard, a game of co-operation in which each player is a detective… except for the shady “Mr. X,” the villain. Over the course of the game, the team of detectives collaboratively chases Mr. X across the city of London. At various points in the game Mr. X appears to the detectives but then just as quickly disappears again. If the “good guys” are able to work together to execute a containment plan, they can catch Mr. X. If not, Mr. X escapes.

Diagnosing Clock Domain Crossing (CDC) errors in an FPGA design can seem a lot like chasing Mr. X: CDC errors surface and then disappear; emerge again and disappear again.

Such errors pose a significant challenge to the design bring-up process. Diagnosing a CDC error is complicated by its sporadic nature and its tendency to emerge and then disappear after seemingly unrelated changes to either the design or the design flow:

Upgrading to a new synthesis or P&R tool version
Switching synthesis tool vendors
Migrating target technologies
Insertion or removal of debug components, probes, or other unrelated logic

CDC issues cannot be detected reliably using traditional verification techniques such as static timing analysis and simulation. These traditional tools were originally intended for single-clock designs and they are rarely able to defeat the kind of problems that arise in advanced multi-clock architectures.

Even FPGA prototyping methods will not reliably detect CDC issues. Some design teams rely on FPGA prototyping as the mainstay for verification, but this approach is not suitable to identify all functional issues. Lastly, the age-old method of manually inspecting the RTL is unreliable at best, especially with the increasing number of CDC paths in today’s ever more complex designs.

Why is Mr. X So Elusive?

Metastability is not accurately modeled in simulation, so silicon-accurate metastability behavior cannot be observed in simulation. Static timing analysis ignores paths that cross asynchronous clock domain boundaries. The surfacing of CDC issues depends on technology factors and operating conditions such as temperature and voltage. CDC-related issues may differ from one FPGA architecture to another, from vendor to vendor, or even from one placement to another. Once CDC issues are suspected in silicon, using FPGA probing techniques can also change the characteristics of CDC paths and cause a CDC issue to “disappear.”

In order to catch Mr. X, we must understand his “M.O.” (detective-speak for “Modus Operandi” or “method of operating”). A CDC path is a signal that crosses from one asynchronous clock domain to another. Since the asynchronous transmit signal will inevitably violate the setup and hold timing requirements for the receive register, all CDC receive registers will go metastable periodically. We can use a mean-time-between-failure (MTBF) equation¹ to determine how often we’re likely to witness Mr. X in action:

MTBF is determined by technology-dependent coefficients (τ and T₀) as well as the frequency of the CDC signal and receive clock. When a register goes metastable, its value is neither a ‘0’ nor a ‘1’ and downstream logic will not see consistent values. The logic may function incorrectly unless it has been designed to tolerate metastability.

How Can We Contain Mr. X?

Effectively tackling CDC errors requires designers and their tools to work together. Only the engineer is familiar enough with the design to understand CDC issues and invoke the appropriate RTL fixes. A good approach combines proven synchronization structures and an appropriate coding style with your synthesis and CDC analysis tools.

Synchronization structures such as a pair of D Flip-Flop synchronizers are among the standard design practices used to avoid CDC issues. There are many types of synchronization structures for mitigating CDC issues and your detective team needs to verify that these are used, and used correctly. Many designers have deployed FPGAs or ASICs only to discover a functional CDC issue in which a path was synchronized incorrectly, or not at all. Fortunately there are tools and techniques that can help designers verify the correct usage of synchronization structures.

When synchronization structures are placed in the code, strategically-placed coding details can simplify CDC verification. For example, including either “SYNC” or “FIFO” as part of the instance name for a synchronizing structure will cause the instantiation paths of all protected clock domain crossing points to contain the keyword. Figure 1 illustrates this approach. Thereafter a quick check of the synthesis tool’s clock domain crossing report can verify that all CDC paths are protected.

Figure 1: Adding “CDCFIFO” to the mnemonics for a synchronizing structure’s instance name will embed the term in the instantiation paths of all protected clock domain crossing points.

CDC Verification Tool s Track Down the Villain

There is another critical tool in the CDC detective’s kit. A dedicated verification solution such as Mentor Graphics 0-In® CDC is the most efficient tool for validating synchronization structures, CDC protocols and reconvergent logic. Figure 2 depicts a sample screen image. When the tool reads in an RTL design, it automatically detects asynchronous clocks, CDC paths and synchronization structures. This structural analysis results will not only identify correct synchronization structures, but also will identify CDC paths with bad or missing structures as shown in Figure 3. Designers will want to review the CDC violations, correct the bad or missing synchronization structures and waive any CDC paths with exceptions. A full-featured verification tool will offer a structured approach to reviewing results and a straightforward way to waiver ad-hoc synchronization schemes and other exceptions; it also will keep track of those waivers for design reviews.

Figure 2: Reconvergence of two or more synchronized signals can result in synchronization problems. When multiple signals reconverge, their relative timing is unpredictable. Logic that receives these signals should account for potential cycle skew.

Figure 3: This screen displays bad synchronization structures such as this one in which combinatorial logic elements are interrupting the path between the transmitting register and the pair of D FF’s that function as synchronizers.

The Truth, the Whole Truth…

Precision Synthesis provides a detailed report for all paths on which a signal crosses from one clock domain to another in the synthesized design. To leverage this report, be sure to define all clocks, specifying the proper clock domains. Then produce the clock domain crossing report after synthesis (shown earlier in Figure 1 which depicts the CDC report for the same design using two different coding styles). Note that with use of proper coding style, CDC protection can be quickly confirmed either by inspection or by executing a simple text-searching script.

Case Closed

Imagine yourself as part of a detective team chartered to identify and avoid CDC errors. Is your team confident about catching Mr. X and all his partners in crime? Or are Mr. X and his gang eluding you again, only to be caught late in your design cycle or found by customers in the field?

As this article has explained, Mr. X is very slippery. He can’t be reliably detected using traditional verification techniques. Designers must confirm in advance that metastability will not cause functional problems in their design logic. Attempting to use CDC design techniques and tools separately will allow Mr. X to elude capture, resulting in CDC bugs in the silicon. Capturing Mr. X requires the attention of savvy designers using the same collaborative, targeted approach a team of detectives might use: proven CDC design techniques, specialized CDC verification tools, and trusted synthesis tools.

¹Chaney, Thomas, “Measured Flip-Flop Responses to Marginal Triggering,” IEEE Transactions of Computers, Volume C-32, No. 12, December 1983, pgs 1207 to 1209.

As the industry begins production at 45-nm geometries, one of the issues that needs to be resolved is that of planarity—the flatness of the IC after chemical-mechanical planarization. At 45nm and below, fill solutions become much more challenging, because manufacturing processes and physical interactions become more sensitive to small metal density variations.

A big part of the problem is that the allowable thickness variation has remained constant at +/-15%, even while the total thickness of wires has decreased. This dichotomy means that the percentage of total thickness variability has increased at each node (Figure 1).

Figure 1: Allowable thickness variation as a growing percentage of total thickness.

Designers have long used metal “fill” to achieve a more even distribution of metal across the die by adding non-functional metal shapes to “white space” regions in a design. One purpose of metal fill is to reduce the variations in thickness that occur during CMP. By achieving a more uniform thickness, designers can reduce variations in interconnect resistance.

However, the CMP impact on metal thickness also means the performance of the chip becomes more sensitive to parasitic capacitance (which is increased by adding metal fill) and variations in interconnect resistance (Figure 2).

Figure 2: Thickness variations translate to resistance variations

Clearly, metal variation caused by CMP is now a significant issue that requires analysis and simulation for successful production. Meeting all of the manufacturing and performance constraints requires better analysis to predict both the manufacturing and electrical impacts of fill, and more sophisticated algorithms that optimize the use of metal fill features to solve the three fundamental fill issues:

What shapes do I use?
How many should I use?
Where do I put them?

The goal of any fill solution is to add the minimum amount of fill in the right places and in the right shapes to optimize performance while minimizing manufacturing variations. CMP modeling can provide a virtual environment to gather information about resistance variability, and enable designers to control thickness variation by adding just enough fill. With the ability to use a CMP model for planarity analysis, designers can better optimize fill to decrease resistance variability while minimizing capacitance variability.

A short history of fill solutions helps place the value of CMP modeling in perspective.

Density rules originated due to the variation in line width caused by differences in etch rates. As the line width of process nodes decreased, more backend design rules were added to accommodate the growing impact of variations caused by the manufacturing process. The primary solution to density rule violations has always been to add additional metal structures to the design that are independent of the original circuit functionality. Because the first use of this technique sought to “fill up” white space in a design, these non-functional structures were traditionally called “fill.”

The first automated fill techniques, introduced more than 20 years ago, are now generally referred to as “dummy fill.” The name is apt for two reasons: 1) the metal fill shapes have no electrical relevance, and 2) the fill algorithm blindly adds as much fill to the design layout as it possibly can, without regard to the electrical impact. Because the fill technique is completely independent of the design, dummy fill often adds more fill than is actually necessary. Also, because dummy fill algorithms use preset patterns and placement, the impact of fill on timing is not even a consideration.

Bottom line—the traditional approach for dummy fill simply adds fill shapes without any design analysis to determine the fill shape or how much fill should be added to a specific design. While dummy fill enables a design to be more resistant to CMP variations, it provides only a generalized answer to our three fill questions. Smarter fill techniques are needed to provide design-specific control over fill density and fill shapes at smaller nodes.

Although still rule-based, density-based fill adds some design analysis to optimize fill as it is being added and tunes the results for a specific design. Density-based fill divides a chip into windows, then evaluates the feature density in each window and inserts fill only where the density is outside of the various density constraints. Density-based fill needs a multi-dimensional fill solution because, to achieve the density rules, it must take into account all of the following factors:

Min/max density percentages, as defined in the density constraints
Gradient of density variations across windows in the design
Magnitude—the difference between min and max densities over the whole die.

Density-based fill techniques can also attempt to “smooth out” density variations between spaces of high density and low density by using gradient fill to moderate changes in density across the chip.

On average, density-based fill also reduces parasitic capacitance issues when compared to dummy fill, because density-based fill uses the minimum amount of fill needed to meet density requirements.

By adding fill analysis during the filling process, density-based fill allows designers to incorporate design-specific fill solutions that minimize the amount of fill added, and reduces iterations between dummy fill and the signoff DRC deck. This provides smarter answers to our three fill questions, and enables the design team to have better control over the fill process.

The next stage in fill evolution, equation-based fill, is also rule-based, but represents a transitional attempt to handle the increasing complexity of the fill problem as we reach the 65/45-nm nodes and below. The same technology used to perform equation-based design rule checking (eqDRC), introduced in 2007, can be used to analyze fill solutions using continuous, multi-dimensional functions in place of linear pass-fail conditions. These design rule equations can enable finer resolution of complex fill rule checks that cannot be performed with single-dimensional design rules alone.

Equation-based fill allows users to account for other effects besides density. For example, designers can use equation-based fill to consider the perimeter of the fill shapes to reduce the variation that occurs during the etch process. Etch depth is impacted by both density and perimeter. Equation-based fill allows designers to consider both effects while they are performing the filling process. This technique is evolutionary in that it uses the same technique as density-based fill of dividing a chip into windows. However, because designers can vary the value for each window, it can optimize fill across the entire design.

Not only can the results of an equation-based rule check identify constraint violations, but because the rule is expressed as a mathematical function, it can be “solved” to identify the relative contribution of each factor to the result. This feature enables designers to use the results of equation-based design rules to identify how the fill shape or placement needs to change and by how much. It also enables the designer to make design tradeoffs—changing one feature by more to avoid making drastic changes in another—while still ensuring overall design rule compliance. Within a rules-based environment, equation-based fill can provide very specific answers to our three fill questions, enabling the designers to create fill patterns that maximize the benefits of fill while minimizing its effect on performance.

However, to achieve the highest level of fill accuracy and precision, designers need to simulate the actual manufacturing process on a specific design and use the predicted thickness data to drive the fill algorithm. Simulating the CMP process provides thickness data that is specific to that design, enabling the fill algorithm to not only determine how many fill shapes to use and where to place them, but also, based on information contained in the model, determine the optimum shape of the fill metal. Using CMP simulations allows users to perform optimum fills for a foundry’s most advanced processes.

The key to a CMP-based fill solution is to use the simulation output to improve the planarity of the design in a manner referred to as model-based fill (Figure 5). Based on thickness simulation data from the simulation, special algorithms pass the appropriate data to the fill analysis tool to determine the optimum filling strategy. The combination of accurate simulation data and the fill algorithm improves parametric yield by reducing thickness variation that affects resistance, while at the same time minimizing the capacitance added to the design. Model-based fill enables designers to add as few fill shapes as possible while still achieving specific planarity goals, providing the most specific and customized answers possible for our three fill questions.

Figure 3: Model-based fill adds thickness analysis to the filling algorithm

Model-based fill can also be extended to timing, where the design team supplies a critical net list that the fill algorithm uses to minimize the impact of fill on timing by placing fill farther from these critical nets.

Additionally, thickness information from the CMP analysis can be inserted into extraction tools for more accurate timing analysis. Extraction data can also be analyzed in place-and-route tools, or other timing analysis tools.

As the significance and impact of thickness variation has increased, our ability to solve performance and manufacturing issues with metal fill has been extended with ever-smarter fill technologies. As we reach production in 45-nm designs, planarity simulation models are essential to the accurate and complete analysis needed to solve complex fill requirements.

by Jeff Wilson and Craig Larsen, Mentor Graphics

August 25, 2009

Author Bios:

Jeff Wilson is a DFM Product Marketing Manager in Mentor Graphics’ Calibre organization. He is responsible for the development of products that address the challenges of CMP and CAA. He previously worked at Motorola and SCS. Jeff received a BS in Design Engineering from Brigham Young University and an MBA from the University of Oregon.

Craig Larsen is a DFM Technical Marketing Engineer at Mentor Graphics in charge of Planarity Solutions. Before joining Mentor Graphics, he was the founder and principal consultant of Silicon Harvest, a DFM and Yield Services consultancy focused on design-based yield improvement. Prior to Silicon Harvest, Craig held a variety of semiconductor design and manufacturing positions at PDF Solutions, Centillium Comunications, Cirrus Logic, IDT, and AMD. He received his BS in Physics from UC Santa Barbara and his MS in Physics from San Jose State University.

Catching Mr. X: Diagnosing CDC Errors in FPGAs

Related

Leave a Reply Cancel reply

Fill Solutions Are Getting Smarter

Related

Leave a Reply Cancel reply

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

featured chalk talk