We Need to Talk!

What we have here is a failure to communicate. – Cool Hand Luke

The farmer and the cowman should be friends. – Oklahoma

There are apparently a couple of silos in the EDA world that could use some breaking down.

On one side, we have verification. This is a well-established discipline involving numerous EDA tools and a brief that compels verification engineers to make sure that a design does what it is intended to do.
On the other side, we have safety engineering. This is a newer discipline to EDA, charged with making sure that a design won’t put someone or something in danger if things go awry.

Historically, safety has been limited to the rather rarified realms of aviation and military. Folks operating in those markets have been a different breed, sacrificing flexibility and agility for what many might see as a cumbersome, inefficient process of checks and cross-checks and adherence to what can be mind-numbing regulations, all designed to keep soldiers and aircraft passengers and, frankly, innocent bystanders, safe.

Of course, today, safety is all the rage, what with everyone and their cousins storming the automotive world in order to get a piece of the high-volume action there that they would never see in the mil/aero space. So new tools are emerging, and old tools are being adapted to the needs of safety.

“Verification” and “safety” aren’t usually seen as related topics. Yes, you might say that “safety” is all about verifying that a design is safe, so to understand the split, we need to realize that “verification” really means “functional verification.” Although “functional” sometimes means logic and not timing, so perhaps we could call it “operational verification” if that gets us out of a semantic bind.

The bottom line is that “verification” has never traditionally included verification of safety, which is why we now have two silos.

Having two silos is ok if they never need to interact. And you might think that to be the case. There are verification tools and there are safety tools. Yeah, they may share some common roots, but, in everyday usage, those tools are used by very different people for very different things.

Now… the farmer/cowman analogy above may be stretching things too far; after all, ranchers and farmers were at each other’s throats, as farmers needed fences and ranchers wanted none. There’s none of that kind of beef in the verification and safety worlds (that I’m aware of). So it’s not so much that verification and safety engineers are mutual enemies now; it’s just that they could be friends instead of strangers.

And, as OneSpin told it during DAC, smooth interaction between the groups is often missing, despite areas of overlap – and despite the fact that safety engineers need data from verification engineers in order to complete their verification.

The Overlap

We’ve seen before that there are two kinds of possible failures: systematic and random. Systematic failures are a result of a problem with the design itself. Whenever the conditions needed to trigger a fault are met, then the failure occurs. Random failures, on the other hand, have unpredictable causes and timing – like alpha particles mashing the memory.

We have spent more of our attention on the random faults because that’s where new tools are needed. But, to be clear, systematic failures are specifically the area where verification and safety have the most overlap. While verification engineers are making sure that everything works, safety engineers want to make sure that no element of the design will lead to a failure that could compromise safety.

In fact, in a way, the safety side of this is more stringent than the verification side. You may be familiar with the notion of “necessary and sufficient.” When proving, for instance, that a particular animal is a dog, it is necessary to determine that its genetics (chromosomes, genes, etc.) are that of a canine. That’s also sufficient. That the animal has fur (ignoring dogs that have hair) or that it barks (ignoring dogs that bay) may be true, descriptive, and useful, but those facts aren’t necessary to establish that the animal is a dog. The genetics are both necessary and sufficient.

Verification has traditionally been very much of a “necessary”-focused discipline. All the product requirements are necessary, and so each of the circuits that implements a requirement is therefore necessary. Verification helps to assure that all the necessary circuits are in place.

But safety adds the “sufficient” piece to the verification. Is there a block of logic that’s not doing anything? Are there some clever “easter eggs” in the design? Those have both often been ignored in the interest of getting to market quickly. If the extra circuit doesn’t do anything (at least as far as one can tell), then who cares? We can take it out in a future cost reduction to gain the area back. And if there are easter eggs, well, we just give a little “boys will be boys” chuckle.

But that won’t pass muster in a safety-oriented design. If it doesn’t do anything, get rid of it. (That goes from a security standpoint as well.) Or, if it does something that’s unrelated to any product requirement, then it goes.

In this way, verification and safety should very much be aligned in what they do when it comes to systematic failures.

The Dependency

For random errors, however, the relationship between verification and safety isn’t so much one of overlap; it’s more one of dependency. That is, the safety guys need some information from the verification folks in order to complete their determination as to whether the design meets safety criteria.

And this is where we’re seeing some issues, according to OneSpin. The communication isn’t happening as often and as openly as it could – or should.

Why would the safety folks need data from the verification folks? Well, let’s take one example. Since we’re talking about random failures here, we’re talking probabilities. And each transistor on a given process has some probability of unanticipated failure. A small probability perhaps, but, add up all of the transistors in a circuit, and it may no longer be negligible.

And that takes us specifically to the issue: in order for the safety guys to calculate the overall probability of failure for a block or circuit, they have to know how many transistors it contains. And that’s an example of data that they may have a hard time getting. Again, there’s nothing nefarious or contentious going on; it’s just that the verification guys are busy doing what they’ve always done, and that process hasn’t included good communications with safety folks.

Yes, technically, given access and permissions, safety engineers could use the same EDA tools that the verification guys use – except that safety engineers often aren’t familiar with those tools at a detailed level. So they rely on regular tool users (i.e., verification engineers) to extract the bits that they need.

Better tools can help here. For instance, OneSpin says that the safety-related tools built on their formal technology can work with other tools or design databases to get some of this information (like the number of transistors), meaning that you don’t need to wait for individuals to deliver the data. And, in fact, they say that usage of their safety tool is roughly split 50/50 between verification and safety engineers.

So that can help. But it’s evidently not enough; communication is still needed. So they’re sending a message to the community: “Let’s talk.”

More info:

OneSpin

Sourcing credit:

Jörg Grosse, Product Manager for Functional Safety, OneSpin