If a clock tree fails in a forest, and there are no vectors to catch it…
Verification has always been the black sheep of the engineering family, and for understandable reasons. Design teams are made up of intelligent, capable, and – dare we say occasionally arrogant – types who don’t take kindly to the notion that their work contains errors. Yet, we have verification teams who make their entire career finding the bugs in the work of designers.
Does this sound like a recipe for peace and harmony?
As system complexity has exploded, design productivity has largely kept pace. There is, of course, the ubiquitous EDA marketing slide – a graph over time showing an expanding “gap” between the number of gates we can design and the number of gates Moore’s Law will allow us to put on a chip. This has been slide 2 (after the title slide) in just about every EDA PowerPoint deck for the past thirty years. Still, miraculously, this long-predicted gap has failed to materialize. With the help of constantly improving tools, design engineers have raised their level of abstraction – from transistors to gates to RTL to large subsystems – and have been able to crank out state-of-the art designs with aplomb.
As our systems have grown larger and more complex, however, the task of verifying them effectively has become almost untenable. As a result, verification has become an ever-growing slice of the project budget, schedule, and risk. Systems haven’t just gotten larger, either. Today, systems are complex, interconnected, asynchronous, multi-domain combinations of digital, analog, mechanical, sensors, and software. Long gone are the days when a system could be verified and quantified using an orderly sequence of vectors – with inputs and expected outputs in neat little rows and columns, and a clear coverage metric allowing us to keep score.
Now, the design elements that require simple vectors to verify are our building blocks, and they typically arrive “pre-verified.” So, does creating a design completely of “pre-verified” blocks solve the problem? Of course not – no more than building a machine from pre-verified transistors obviates the need for further verification. As our level of design abstraction rises, so must our level of verification abstraction. We need to verify at the level at which we design, even if we take the pre-verifiedness of our building blocks as sacrosanct.
Our new system has an impossibly large state space to verify, or even to explore. Given the asynchronous nature of the interacting subsystems and the gargantuanity of the data we’re dealing with, we have to be smarter about our verification than even constrained random stimuli will allow. The most straightforward approach is to think about how the thing will be used and come up with scenarios that explore the corners of that space. What happens, for example, if a phone call comes in while streaming video at a time when an automatic backup is in progress? This exception-to-an-interrupt-to-an-interrupt asynchronous multi-tasking scenario is usually the type of situation where today’s bugs are found.
Bugs are not always so well behaved as to loiter clearly in a single domain such as software, digital, analog, or interface, either. With multiple heterogeneous processors performing parallel tasks, the timing of the triggers for troublesome bugs can be elusive. All of this brings us back to well-conceived use cases and a disciplined approach to our verification methodology.
As we embark on our verification voyage for any reasonably complex system, we quickly run into our first seemingly insurmountable obstacle – performance. When conventional simulation techniques can require days to simply boot a popular operating system, there is little hope of putting an entire system through its paces using a software simulation approach. Elevating parts of the system to higher levels of abstraction can be helpful in achieving more reasonable performance, but one must be careful not to abstract away potential bugs as well. If a bug is caused by timing interactions at the register level, a non-cycle-accurate behavioral model of the offending subsystem will likely obscure the problem.
At some point in most designs, we usually need to rely on modeling large portions of our system using actual hardware. When we’re at the early stages of debugging, the quick iteration times and very high visibility of emulators can justify the arguably astronomical price tags of the systems. The $300M plus emulation market is shared primarily by Mentor Veloce2, Cadence Palladium, and Synopsys Zebu. Expect to pay seven figures for one of these handy hunks of hardware in fighting configuration. (One might speculate that a significant portion of the cost of today’s emulators is wrapped up in the amortization of legal fees from the big competitors’ legacy of suing one another incessantly over emulation technology for the better part of the past two decades.)
If you don’t require the rapid iteration and increased visibility of an emulator, you can get a lot of prototype performance at a much lower cost (although still not cheaply by any measure) with FPGA-based prototyping boards and systems. FPGA-based boards will cost you on the order of 1/10 the price-per-equivalent-gate compared with an emulator. As a bonus, they’ll run your design as much as 10x faster. If you need a prototype with real- or near-real-world performance for development and debugging of software, FPGA-based prototyping boards can be indispensable. Your software team can be productive long before the real hardware is ready. Since software development is almost always the critical path for system implementation these days, the payback from starting software as early as possible in the system design can be huge.
While some teams still develop FPGA prototyping boards themselves, you are generally better off going with an off-the-shelf solution from one of the reputable suppliers. Synopsys HAPS, Cadence Protium, Aldec HES-7, and Dini FPGA prototyping boards are all modular, flexible, and proven. While it may be tempting to base your selection of one of these options on simple capacity/performance/cost metrics, the real differentiators tend to be in the associated software. Be sure to look at the entire picture when choosing a prototyping platform, not just at the hardware.
Even with prototyping hardware that’s up to the task, we still need a disciplined methodology for our system verification. When it comes to methodology, the digital hardware folks have taken the lead, driven no doubt by the exorbitant price of failure in digital hardware. The cost and chaos associated with an SoC re-spin to fix a digital bug dwarfs the pain of the usually-simple patch process for late-stage software bugs. And analog circuitry has nowhere near the complexity (and thus nowhere near the vast expanse of hiding places for errors), as either software or digital hardware. Therefore, it’s not difficult to understand why the digital world was the origin of efforts such as OVM, VMM, and UVM.
OVM (Open Verification Methodology) was one of the first major efforts at standardizing and codifying the verification process. OVM was oriented around SystemC and SystemVerilog, and it provides a library for stimulus generation, data collection, and control of the verification process. OVM paved the way for UVM (Universal Verification Methodology), which provides an open source SystemVerilog library of verification “components.” UVM is aimed at providing portable testbench reuse, enabling the creation of powerful verification IP.
Cadence recently attacked the SoC verification problem from a very high level with their “Perspec System Verifier” platform. Perspec is use-case oriented, providing a framework for the capture, sharing, and execution of use-cases via UML-style diagrams. The system can automatically generate tests, and it can execute those tests across the various platforms as product development proceeds. This type of software-driven verification approach is likely to become more pervasive – giving us a way to stitch together the myriad tools and processes involved in our verification and debug process.
Of course, the best solution would be to design our creations in such a robust way that we’d never find bugs in the first place. That’s the world most of us imagine when we start down the long path of engineering our next masterpiece. But then reality creeps in, verification engineers show up on the job, and we find ourselves wondering exactly what we were thinking – hooking the bits up in that order. We are only human, after all.
I would ask the question: what is it that you are verifying?
Digital stuff is all representable as FSMs and it should be possible to make that run fast in simulation and formally verify that the implementation is correct. However, there are lots of analog effects in implementation that the formal tools don’t handle, and (as far as I can tell) the simulators/emulators don’t handle either – unless you drop to SPICE level (which is too slow to be useful).
When I look at SystemVerilog (*VM) I am reminded of the quote about VHDL – http://www.cl.cam.ac.uk/~mjcg/Verilog/Costello.html