More Than A Zen Thing

If a bug exists in a design and nobody notices, it is still a bug?

This question is more than just a play on the more familiar sylvan conundrum. And its answer is actually more nuanced than you might think. It transcends what would appear to be a simplistic peaceful Zen interlude to an otherwise hectic design schedule. Its subtlety keys off of what is meant by the word “notices.”

There are two ways in which a bug could be noticed. The one that matters, the most important one, the one for which millions of dollars are spent in verification, is in using the system containing the design. If the system fails, then the bug got noticed.

The other way in which a bug can be noticed is during design verification. The whole purpose of verifying the design is to notice any bugs here so that they will never be noticed once the system is deployed. What if a bug isn’t noticed here? It means the verification scheme isn’t capable of detecting the bug. And that’s a problem.

You could argue, in theory, that if there is a bug, but it is completely backed up by redundant logic – on purpose or by happy coincidence of a poorly optimized design –– and therefore the effects of the bug can never ever see the light of day, well, who cares? By definition, it can never cause a problem.

So let’s acknowledge that rather unlikely situation so that the nerd in the back of the classroom* who’s always looking for ways to stump the professor will be denied an opportunity to point this out. And we can move on to focus on the more salient scenario.

We’ve all, at one time or another, been faced with some manifestation of the question, who’s testing the tester? If a test fails, is it because the TUT (Thing Under Test) actually failed? Or is it because the tester failed? Based upon numerous formative years of brow- or knuckle-beating, we have been imprinted with the assumption that he or she who administers the test is right. Period. But we’ve probably also been privy to those quiet moments of humility where the tester has to adjust a grade due to an incorrect marking of a test. Yes, the tester is fallible.

And so it goes with verification. Those charged with poking holes in the design must construct an unassailable edifice of righteousness that can stand resolute against the outraged protestations of a designer whose crowning creation has been found wanting. The best scenario is that there is no problem with the design, and that we simply know that based on general omniscience. The next best is that the design’s correctness gets verified, and when the verifier says there’s a problem, it means there really is a problem. And, more importantly, if the verifier says there isn’t a problem, it means there truly isn’t one.

And therein lies the catch. The verifier is watching the designer. But who’s watching the verifier? How do we know for sure that, when the verifier says all is groovy, in fact that’s the case? There are tools for checking the “completeness” of a verification suite, but in the logic design world, those typically measure the number of “stuck-at” faults that can be caught. This is based on the real-world situation in which real wires on real silicon can be mis-processed such that they’re inadvertently tied to a permanent low or high value. Having tests that catch that are useful.

But real designs that haven’t yet been put on real silicon can’t have stuck-at problems. They either do what they’re intended to do, or they do something else. And they do whatever they do every time. The question is whether what they actually do matches what was desired. When testing the chip, the design is assumed to be correct and the silicon is suspect. During verification, it is the design itself that is suspect. So now we must ask the question, could the verification be suspect?

The software side of things

When software testing is evaluated for “coverage,” we typically refer to the amount of code that actually gets executed when the tests are applied. If some loop is never entered during a test, then there is no way to know whether the contents of the loop are correct. And therefore the test is incomplete. If all lines of code are executed, then coverage is thought to be complete.

But that assumes that anything wrong with any line of code will result in an incorrect output as long as the line is executed. And that may not be the case. An incorrect partial result might propagate forward in the calculation, but in the end, some other value might be used for the final result, discarding the incorrect value. So the bug didn’t get noticed. Even though the line of code got executed. If there is no other test that results in the bug’s effect creating an incorrect output, then, even though the line got covered, the testing is incomplete because the bug never got found.

And this gets to the heart of the completeness question for both silicon and software (which may be realized on silicon): bugs must not only be triggered (or “activated”), but their effects must also be able to make their way to an output somewhere (that is, they must be “propagated”), and then, once at an output, someone must notice that something’s wrong (they must be “detected”). Typical coverage metrics focus on the activation portion – if the code was executed, then the bug was activated. What’s not always measured is whether the bug was propagated and detected. If a bug can’t be propagated and detected, then the verification suite is incomplete.

Certess, a company that’s jumped into this space, refers to the process of validating the verification as functional qualification and has addressed it by picking up an old idea developed in the 70s. This is the concept of inserting “mutations” into a design to see if they “survive.” This grim post-nuclear metaphor is used to describe the process of making changes to a design to see if the verification suite finds a problem after the change is made. If the effects of the mutation are detected, then the mutation is “killed.” If not, then the mutation “lives.” And somewhere, somehow, right now, someone should be dreaming up a series of horror movies based on the concept of a design within which numerous mutations live, silently breeding and planning the day of their ultimate victory.

Let’s look at an example: the one Certess uses is a very simple one where an “a or b” statement in the original design is mutated to “a and b” and the design is then exercised by the test suite. If “or” causes correct behavior and the mutation causes incorrect behavior, then the “or” is thought to be correct, and this aspect of verification is thought to be complete.

However, if the design behaves correctly under the test suite with both “or” and “and” (don’t read this after too many beers), then an interesting question poses itself: which is correct? Everything seems to work regardless. So one of two conditions is true: either “or” is correct, and the test suite isn’t complete enough to confirm that fact, or “and” is correct, meaning that the “or” is actually a bug that isn’t being detected. Either case means that the verification is incomplete.

This is generally more than just an academic exercise: typical metrics address the bugs being activated, but they don’t measure how many aren’t found, meaning that you really can’t say for sure how good the verification is. But this technology has languished in the academic realm for years because making full use of the concept means placing mutations all over the design (and not all at once, since some might mask others), and the calculation of mutations plus the repeated testing takes a lot of computation. So it’s been an interesting but intractable idea.

Certess claims to have made substantial breakthroughs in the efficiency of the algorithms so that it is now practical to apply mutations to real designs to measure the quality of the verification suite. This can be useful for improving the chance that first silicon works; it can be even more important for IP vendors trying to convince their customers that their IP is rock-solid.

Their first product was aimed at HDL implementations of designs; they recently announced a C version, the theory being that catching design faults at a higher level of abstraction, prior to creation of RTL, will streamline the downstream implementation. (And, just to ensure that life hasn’t become too dull for these folks, they’ve just been bought by Springsoft.)

It is therefore now possible to answer the question: If a bug exists in a design and nobody notices, it is still a bug? And the answer is, Yes. The job now is to make sure someone notices.