Less Pessimism, Please

When I was a teen, I remember someone coming to help my Dad plan a big move of some piece of equipment on our orchard. What struck me was how this guy could pre-visualize all the various scenarios and then consider the consequences and, in particular, all the things that could possibly go wrong for the different cases.

I was pretty impressed. It seemed like such thorough analytical capabilities were a skill and a gift. Over time, however, I’ve noticed that such an ability isn’t always appreciated. Because most of the world likes to assume best-case conditions, the “value add” of the more thorough approach is to point out that there are potential negative consequences too. Which makes you Mr. Negative. The pessimist. Always there to dash cold water on a hot idea. Doesn’t matter if you’re right.

In actual fact, too much pessimism can do more than dampen spirits; it can lead to outright incorrect (or unlikely) outcomes. If you’re planning a trip cross-country and, in calculating how long it will take, you assume that you will hit rush-hour traffic in every city, you’ll be wrong. There’s no way you could hit every city during rush hour. So your estimate would be grossly over-pessimistic.

This specific calculus is at play in the world of static timing analysis (STA). A series of improvements to STA technology have been developed to account for the increasing importance of on-chip variation (OCV). While such issues as run time have been a factor in driving these changes, the over-riding consideration is pessimism: each new approach to timing analysis is intended to reduce over-pessimism in an earlier approach. But each new approach comes with tradeoffs.

Basic STA is straightforward: cells and interconnect are characterized to build a library that says how much delay each of them will contribute to a signal path. In theory, it’s one number, but, in reality, basic corners are provided to account for min/max temperature and supply voltage as well as fast/slow transistor combinations at a minimum.

The problem is that the accommodations made for operating conditions and process corners are applied to the entire chip during analysis: these are so-called global variations. You can add more nuance to the model, but as long as a single setting affects every instance in the chip, you’re going to miss the effects of local variation.

The ultimate solution to this is statistical static timing analysis (SSTA): instead of simply using a single number to reflect delays, they’re now represented by a mean and a deviation. The deviation you target reflects the yield distribution you want to see at test. The frequently-referenced value of 3σ covers over 99% of the distribution, limiting test loss.

The key to reducing pessimism comes from the way variations accumulate along a path. And that depends on the correlations between effects – that is, whether a variation on one cell implies the same variation on another. With global variations, variation among cells is correlated, so you get the accumulated variation of the path delay by simply adding the delay variation of the elements along the path.

But local variations aren’t correlated, which means that, while they could conceptually align, more of the time they would not. So you don’t accumulate the variations by adding them; instead, you sum the squares of the variations and then take the square root. It’s a Pythagorean-looking thing (for two cells), as if the contributing variances are the legs of a right triangle and the cumulative variation is the hypotenuse. And we know that the hypotenuse is shorter than the sum of the two legs. In other words, the combined variation is less than would be predicted by adding the component variations: it’s less pessimistic – and more accurate.

Accumulating variation along a path is relatively straightforward; figuring out the net variation due to two converging paths is less clear. Path-based analysis, where each individual path is analyzed on its own, avoids convergence, but scales exponentially as the circuit grows.

In that vein, the biggest knock against SSTA is the fact that the characterization database is large and that run-times can be long. You can find more than one opinion that states that large full-chip SSTA really isn’t feasible today. So other ways of dealing with variation have been developed to supplement standard deterministic STA tools and extend that methodology as far as possible.

The first of these attempts is the so-called basic OCV analysis. It relies on the fact that best- and worst-case conditions can’t occur at the same time. A single global derating factor is applied to delays to account for the fact that they will be somewhere between best and worst case in reality.

This offers some improvement but becomes too crude as dimensions shrink. The single OCV derating factor reflects some “average” path length but is over-optimistic for short paths and over-pessimistic for longer paths. The solution here has been to use a table of derating values. By specifying the length of the path and the position of the cell in question in that path, you get a derating value from the table that you can use to account for this location dependence.

This approach is generally referred to as Advanced OCV, or AOCV (which also appears to be referred to as Location-based OCV, or LOCV). It provides better accuracy and reduces pessimism on long paths (as well as reducing the chance of errors in over-optimistic short paths). Of course, it comes with a cost: these tables have to be built, and if each cell in the library has to be characterized for, say, 5 path lengths and 5 positions in that path, that’s a 25-entry table for every cell that has to be derived via SPICE.

There is a tool that can help with this: CLK Design Automation has a tool called AOCV FX that they claim reduces the time it takes to build the tables from weeks or years (theoretically… not sure there’s experimental data on that…) using multicore SPICE to minutes or days using AOCV FX.

There are variations on AOCV. The “default” derating tables assume the full slew-rate range . “Design-based analysis” determines the design-specific range of slew rates in your design in order to customize the derating table. With “instance-based analysis,” each instance of each cell gets its own value in the table based on calculated loads and skews (clearly resulting in a much larger table).

One of the benefits of AOCV is that it works in conjunction with traditional STA tools, and so it can be added to the verification methodology without much disruption. All the major STA tools can accommodate AOCV. However, there’s another approach that’s now being proposed by Extreme DA which they call Parametric OCV, or POCV.

They target their argument against two AOCV characteristics: the complexity of building the AOCV tables (which is presumably mitigated by tools) and an inherent inaccuracy in that AOCV doesn’t independently model n-channel and p-channel mis-correlation, which they say is particularly important for “half-cycle” paths, where launch of a signal from one flip-flop and capture on the next flip-flop are clocked on opposite clock edges.

POCV uses a statistical approach, but it doesn’t do a full SSTA analysis. Instead, it calculates delay variation by modeling the intrinsic cell delay and load parasitics (line resistance, line capacitance, and load capacitance) to determine both the mean and “sigma” (variation) of a logic stage. The cell delay can be further broken into an n-channel component and a p-channel component. They then assume that all the cells along a path have the same mean and sigma.

This means that a given path doesn’t have to be analyzed stage-by-stage; the number of stages can be counted, with the basic stage delay mean and sigma then used to calculate the path delay and accumulated variation. They claim that this keeps the run times down to just over what standard STA tools require, far faster than SSTA. They also claim speedier execution and greater accuracy than AOCV, and no derating tables are required.

Extreme DA provided an example showing the impact of their approach as compared to AOCV on a 1.2-million-instance design at 264 MHz with SI enabled. The focus was on hold-time fixing when optimizing across two corners and two modes. POCV’s reduced pessimism resulted in around 5,800 hold violations as compared to about 15,300 for AOCV. The number of buffers inserted to fix the violations dropped from about 15,000, or around a 1.1% area increase, to about 4,600, or about a 0.33% area increase. Run time was about 20 minutes, down from 60 minutes for AOCV.

Because POCV is statistics-based, it needs an SSTA engine in order to be used. This means it can’t slide into just any STA tool, but the POCV approach is used in Extreme DA’s GoldTime tool and has also been licensed by ATopTech. Extreme DA is open to considering other licensing requests as well.

More info:

CLK-DA AOCV FX

Extreme DA GoldTime Suite