In yet another example of the ascendance of analog considerations, simulation of analog behavior – whether in outright analog circuits or in the secret analog life of digital circuits – has risen to the level of problem that needs solving. SPICE is the mother’s milk of analog simulation, but, in the spirit of actually getting things done in a finite amount of time, SPICE has been divided into the “Fast SPICE” side of things, where you trade off some accuracy for the ability to see results sooner, and full SPICE, which takes longer to run but provides more accurate results.
In case you’re wondering about how long some of these full simulations can take, apparently a PLL simulation for a spread-spectrum circuit can require as much as a month. The need to get fast, accurate results is motivating a couple of different approaches intended to break beyond today’s performance limitations and start approaching Fast-SPICE-like speeds with full-SPICE accuracy. They are two different ways of trying to shake up the stolid, venerable world of Berkeley SPICE and move it forward several steps.
Part of both solutions reflects the need to deal with the growing amount of matrix math required as circuits get bigger. There are two primary components that chew up time during simulation: model evaluation and matrix calculation. During model evaluation, the various models of the components in the circuit are consulted to return key values; matrix calculation provides the ultimate simultaneous solution of all the nodes in the circuit. For small circuits, the model evaluation delay dominates and the matrix calculation is manageable. And for large circuits, it’s the matrix calculation that becomes problematic. Since today’s problem is ever larger circuits, the matrix gets some attention in both cases.
The first company to announce a new solution, Infinisim, addresses this by first splitting the full circuit into blocks; each block gets its own matrix. The theory here is that multiple smaller matrices plus boundary calculations are faster than one honkin’ matrix calculation. The partitioning is done by starting with a grid and then accreting blocks from the grid based on how strongly connected they are. The idea is to minimize connections between the blocks, since these form the boundaries that will eventually have to be resolved.
Now any of us that had woodshop in our childhood has had to solve the problem of which saw to use. Doing the rough cuts with a coarse hand saw gets the job done quickly, although it lacks elegance. It’s much more fun to bring out the coping saw and lovingly hone those fine, intricate curves. But what do you do when you have to cut something that has a variety of contours on it? If you use the basic handsaw, you’ll never make it around the tight corners. But if you use the coping saw, you’ll be forever on the long boring stretches. If you can, you really want to be able to swap saws along the way: recruit the rough-toothed saw for the long straight stretches so that you can remove wood as fast as possible there, while pressing a fine-toothed saw into service when entering the filigree zone.
While this may not seem like rocket science, this concept has in fact been recruited for use by Infinisim. Traditionally, when doing SPICE simulations, you pick a time increment, and the circuit is recalculated at each time increment. The question is, which time increment? Let’s say you’re simulating a circuit that culminates in a digital output. While numerous interesting things may be going on inside the circuit, the output will hold onto its stable value until conditions determine that it’s time to change value. It doesn’t really make sense to recalculate the output stage if its inputs haven’t changed.
This forms the basis for Infinisim’s primary secret sauce: their simulator contains a number of different solvers, each of which has different characteristics, along with a controller that decides which solver to use in which situation. First off, one choice available to it is to use no solver at all if the inputs to the block haven’t changed. This allows a rough-cut approach for steady signals – the timing granularity is effectively extended when nothing’s happening so that you don’t end up performing multiple useless calculations. When a recalc is needed, the controller decides which solver to apply. And, assuming iteration is needed at a given point to get convergence, the controller can select the solver on a per-iteration basis, changing as it deems appropriate during the converging process. Bottom line is, when a signal is changing slowly, a coarse timescale is used; when it’s changing quickly, a finer timescale is used.
One of the key effects they’ve found with this approach is better scalability. While the time required for full SPICE has historically grown much faster than linearly as the circuit being simulated grows, Infinisim’s approach grows linearly, more consistent with Fast-SPICE scaling but with the accuracy of full-on SPICE. They claim an average 50x improvement in performance using this algorithm.
Note that this approach doesn’t take advantage of parallelism to get speed. In fact, they believe that synchronization requirements would keep parallelization from really being effective. However, that’s exactly the approach that another newcomer, Gemini Design Technology, is taking in their completely different solution to the problem.
While they have made some fundamental algorithm improvements, Gemini has focused primarily on the matrix. Dr. Baolin Yang, one of the founders, developed a way of parallelizing the matrix math – something that has apparently eluded mathematicians before now. Diagonal matrices – that is, ones whose non-zero values are concentrated along the diagonal of the matrix, being sparse elsewhere – are relatively easy to calculate. Relatively. But with the growing complexity of circuits and the number of parasitics being modeled, non-diagonal portions of the matrix have become less sparse. Dr. Yang found a way to break the matrix into a number of “sub-diagonal” matrices (i.e., pulling smaller matrices out from the diagonal region) plus one more matrix consisting of the non-diagonal elements. These calculations could then be parallelized and reintegrated into a single result. Note that this way of generating multiple smaller matrices appears to be completely unrelated to the partitioning that Infinisim does.
Using this approach, they have not found synchronization to be an issue. No special tricks were called up to avoid shared data races; standard locking techniques are used. The breaking up of the matrix provides enough data independence to avoid getting bogged down in the writing of shared data. They achieve more or less a linear improvement in speed as more cores are added to the computing platform.
The speed-up number Gemini claims as a result of this is up to 30x faster than old-school simulators. Compared to what they call “first-generation” multi-threaded simulators, they see a 2-10x improvement. Whether this means that Infinisim’s 50x trumps Gemini’s 30x isn’t obvious, as anyone who has tried to make sense of benchmarks will know. Both clearly make a leap beyond what can be done with traditional approaches. It’s not even clear whether they knew about each other, since they’re targeting the incumbents in this area. Gemini in particular has been very specific about targeting Cadence’s Spectre as the “gold standard.” In fact, they think that in some cases they may be even more accurate than Spectre, but they are opting for good correlation rather than best accuracy, since for now they believe that the known Spectre result will get the edge over the still-proving-itself Gemini result, regardless of which is theoretically more accurate.
So the battle to usurp the SPICE crown is on. It’s particularly interesting that two totally different solutions are being applied to the problem. Portions of which could actually be considered orthogonal to each other. I mean… if they were to be combined, they could… aw geez, the patent lawyers would have a heyday. Never mind.