Bashing Bugs

Horrendous quote of the day – “27% of the industry requires 3 or more spins.” This is the headline on a slide from Harry Foster, of Mentor, based on a large worldwide survey of silicon and FPGA implementers and their verification problems, conducted by Wilson Research in 2010. OK, the positive side is that 73% get it right by second spin, and a further 23% by the fourth time round. But with spins costing multiple millions of dollars, you have to have a huge market for a chip to justify that number of spins, and a market that is prepared to wait for the chip to arrive, since, by most estimates, a re-spin is going to take three or more months. And re-spins are only part of the reason why 66% of projects are delivered late.

Harry was speaking at a conference on “Verification Futures: the next five years” alongside a high-powered bunch of EDA vendors and chip manufacturers. Everyone seemed agreed on what the issues are. (All the figures quoted here are from the Wilson survey.) Chips are getting bigger: in 2002, the mean gate count of those surveyed was 400K; last year it was 6.1M, and around 30% of designs were larger than 20M gates and 15% greater than 60M gates. Within these gates, they are getting more complicated: while the median number of processors per chip was 1.04 in 2004, in 2010 it had grown to 2.14, with around 30% having three or more processors and designs of over 20M gates having a median of 3.4 processors. DSPs are also growing in use, as is other IP. Multiple asynchronous clock domains are now commonplace, with only 6% of designs having only one clock domain and 20% of designs having more than 10 domains. Smaller geometries have driven designers to use an array of different active-power management techniques. And the explosion of software within chips brings in a whole new set of issues.

Users are already working hard on developing ways to verify these complex devices. Regression test suites have been expanded, with a median of 629 tests, but with 19% having over 2000 tests. These tests take time – the mean is 20 hours, but 27% take over two days to run and 15% take more than 4 days. Hardware-assisted acceleration/emulation is now used for 45% of designs, and FPGA prototyping for 55%. (These are not mutually exclusive, so some teams may use both. However, for larger designs, greater than 20M gates, FPGA prototyping falls.) Companies are also throwing resources at verification. The mean of total project time spent on verification has risen from 50% in 2001 to 56% in 2010, with over 40% spending greater than 60% and 5% spending greater than 80% of project time in verification. One interesting finding is that the number of engineers assigned to verification varies from country to country. In North America, the peak is 6.3 engineers, while in Asia it is 8.8 and in India 9.2. The number of engineers working on a design has risen – in 2007 the median peak was 7.8 designers and 4.8 verifiers; in 2010 it was 8.1 designers and 7.6 verifiers, with 28% of verification creating a test bench, 27% creating and running simulation, and 32% debugging. (14% is the famous “other”.) The biggest challenges were seen by the respondents as: creating sufficient tests to verify the designs, defining coverage metrics, and knowing the verification coverage. This shows a deep concern that engineers are not certain how well verification is actually working.

These are semi-abstract figures from an anonymous base (there is a lot more detail on Harry’s blogs, starting here). But there were real live users at the conference, and a panel from ARM, ST, Ericsson and Infineon were asked to identify their three greatest challenges in verification. As always, it is fascinating how differently people answer such questions. Ericsson (Hans Lundén) chose to look at techniques, specifically transaction level modelling (TLM), Verification IP, and the tasks of design for verification. ARM, ST, and Infineon agreed that complexity was a challenge. (Infineon is predicting that SoCs in 2016 will have more than 120 IP blocks and 20 CPUs.) ARM and ST also agreed on scalability as an issue. ARM (Brian Dickman) felt that the third challenge was completeness of coverage, which sort-of related to Infineon’s debug automation. ST (Olivier Haller) felt that getting better productivity is increasingly an issue, while Infineon (Clemens Müller) was very concerned about requirements-driven verification. This is clearly linked to Clemens’ focus on automotive and the new ISO 26262 safety standard for cars, and it was followed up with an extended presentation on the issues that Infineon is facing in developing a requirements-driven development methodology.

With ISO 26262 there is a need to demonstrate that the end product meets the initial requirements, and that these can be traced through all stages of the development cycle. One way to do this might be by using UML (Unified Modelling Language) to build an initial model, which can be used to generate documentation, maintain the document as requirements and the model change during development, help clean up code, be linked to testbench documentation, be used to identify code changes and the need to test them, and to help identify verification code such as that not executed to code not covering a requirement. Current UML needs to be extended to move away from an object-oriented base, and current work at Trinity College, Dublin, is beginning to do this. There is still more work to make it fit with verification environments written in e.

Dialog Semiconductor presented another user perspective, this time on the use of OVM and the methodology that they developed to support it. The presentation was honest about the issues, making it clear that adopting OVM would not be appropriate for a one-off – it has to form the basis of the development process across a number of projects to yield real returns.

ARM has a different set of problems, in that they are suppliers of IP. They need to verify their own IP before shipping and provide tools to allow their customers to integrate cores into products. The verification landscape, from ARM’s perspective, has to be seen as an ESL solution. And as such, ARM supplies models of cores as a series of “Russian matryoshka doll” models. At the heart is the Architecture Envelope Model, “an executable version of the ARM Architecture Reference Manual.” Around this is an ARM Device Model (such as Cortex-A15) and the outer doll is a virtual platform model, which adds memory and peripherals. These can be used by software and middleware teams while the final design is still being implemented, and also as a base line for device verification.

The remainder of the conference was a series of EDA company presentations on how they are addressing the issues. There was a gratifyingly low Neal factor in the presentations. (Neal factor is a measure named after an ex-journalist who would rate presentations on the percentage of marketing BS.)

A common thread amongst the presentations was increasing abstraction, ESL if you like, whether using UML or one of the higher level languages like System-Verilog, for better quality at the start of the project. Several presentations discussed using the recently approved UVM (Universal Verification Methodology – the successor to OVM) at verification. Verification IP, building instrumentation into the chip to verify protocols and memory while the chip is running, is becoming increasingly popular, and the EDA vendors all provide access to libraries. Synopsys, who came late to the Verification Methodology party, was particularly keen on UVM.

Springsoft was answering the concerns about assessing the quality of the verification with Certitude, a tool for injecting faults into the simulator and/or test suite and measuring the efficiency with which the faults are identified. Cadence made the case for hardware acceleration as the underlying platform that will unite other verification platforms, such as Virtual platforms, RTL Simulation, and Rapid Prototyping. (One Platform to Rule Them?) Jasper, not surprisingly, looked at applying formal methods and in particular scaling up formal methods beyond logic blocks to whole, large and complex SoCs and, in particular, some of the whole system issues, such as power-up/down sequencing.

A paper not so far discussed was given by the organisers of the conference: a Bristol, England based consultancy, Test and Verification Solutions (TVS). This was, as befits such a conference, at a higher level of abstraction, and they talked about benchmarking verification. The idea is to build, over multiple projects, an understanding of the strength and weaknesses of the approach in use, and from that knowledge to establish a route forward for improvement. (TVS offers a methodology for doing this – in addition to their main work in carrying out verification.)

There is a lot of serious work going into verification, and much of it is worthy of admiration. And it is probably unfair to say this in reporting a conference specifically on verification, but it still seems that the concentration on bug-hunting late in the development cycle is not the best thing to do.

The majority of flaws found, according to the Wilson/Mentor survey, are logic or functional in nature. Isn’t it the responsibility of the development engineer to make sure that these are not in the code, perhaps by writing better code or perhaps by using linting tools early in the process? Or are we hoping that the use of system modelling and other ESL tools is going to lead to the acceptance of code generation, where the code should be “correct by construction”? The same approach of system modelling should also be important in reducing the second most frequent set of flaws, those associated with clocking.

But, having had my rant, it is still wonderful to see the sheer intellectual elegance of some of the approaches to solving the verification problems of the future.