Certifying the Certifier

OK, people: it’s time to talk again about how not to hurt or kill people (or other living things) with electronic gadgetry. Or more-than-gadgetry, like cars that have the temerity to drive themselves. There are so many angles from which to approach how all those functions in such machines can be made safe; we take on yet another one today.

This discussion stems from a conversation with OneSpin at this summer’s DAC. Seems like it was just about this time last year that we talked about how EDA and functional safety work together, but, based on some recent certification announcements, this year we have a view from a different stance.

Can I Prove that my System is Safe?

With specs like ISO 26262 and DO-254, a big burden goes onto you developers to show that what you’ve created is safe. And that includes proving that the tools you relied on are also deemed safe. So, for EDA companies, there are two important aspects of certification: helping their customers to get certified and getting their own tools certified. The latter, of course, helps with the former.

When we talk about helping you to get certified, that tools-certification thing is only part of the answer. There are elements that you need to prove yourself, and tools can assist with that. Information can also help – like safety kits and safety manuals. The latter specify the safe way to use… some thing. Metrics that prove safety apply only if the safety manual is followed.

There are three metrics that matter, but there’s a notion that we need to tease out before we look at them. That notion gets to the difference between failures and latent failures. Note that these metrics assume a single failure at a time.

A basic failure is pretty much what you think: something in the functional behavior of a device that doesn’t work as expected. We put in detection circuits to find such failures so that, if the failure occurs, we can move the system into a protected, safe state.

But what if the detection circuit fails? That’s considered a latent fault. It’s not going to be evident until a basic failure happens and isn’t detected. With that, then, the three metrics OneSpin described are:

Single-point faults
Latent faults
Probabilistic Metric for (random) Hardware Failures (PMHF); this is essentially the failures-in-time (FIT) rate.

There are tools that help with this, but, according to OneSpin, none works at the chip level, or even close to it. That means trying to get the numbers for the different pieces of the design and then somehow cobbling them together as a representation of the whole chip. OneSpin has now taken this on, however, and some of the tools can work at the chip level – even at gate-level.

But What About the Tools?

That’s all great, but if you’re relying on such tools to show that you’re safe, then you need to know for sure that the tools proving your safety are correct and, by extension, safe. Doing so requires answering a couple questions:

What is the tool impact? Can the tool inject faults? Can it fail to detect faults?
Based on the first question, what is the probability of detecting an error in the tool?

With respect to that latter one, there are three levels of ways to certify: TCL1, TCL2, and TCL3 (TCL is tool confidence level). TCL1 is for those lucky tools that have backups. For instance, if synthesis makes a mistake, there is equivalency checking that can find the failure, so, as long as you run the equivalency check, you can work with a less-than-perfect synthesis. And so you don’t have to do a tool qual.

But what about that equivalency checker? OK, maybe you have a tool that tests it. But, if you do, then how do you prove that that tool works? There’s always some last tool in the proof chain, and those last tools are what every other certification relies on. And OneSpin tends to find itself at the end of the chain, meaning TCL2 or TCL3.

What’s the difference between them? Well, that depends on the confidence in the tool’s error detection level, TD (tool detection). TD1 means high confidence; TD2 is medium confidence, and TD3 is low confidence. So you need TCL2 for a TD2 tool and TCL3 for a TD3 tool.

There are four possible ways to attain TCL2 or TCL3 certification. The ways are the same for the two confidence levels; the difference lies in which ones are recommended vs. highly recommended for the various ASIL levels you might be going for.

The four possibilities are:

1a: Get confidence from use. For EDA tools, it’s hard to get this info from customers – and you have to completely redo it with each new release.
1b: Qualify the tool development flow.
1c: Test the tools rigorously.
1d: Develop the tool per the safety standard – which, OneSpin says, is a huge burden.

OneSpin did both 1b and 1c, announcing their certification by the European TÜV SÜD organization. That carries them until their next revision. Yes, we did say that, with 1a, each new revision requires a do-over. But that’s apparently not the case for 1b and 1c; incremental changes to the tool are easier to manage than the initial qual. So they will be updating the qual, but it won’t be quite so burdensome as what they’ve done so far.

The idea, then, is that they also have a safety kit – which includes the safety manual – so that customers can cite the kit (along with proving that they’ve followed the safety manual) to check off the tool certification requirement. They’ve covered both TCL2 and TCL3 – TCL2 on all tools and TCL3 on a customer-request basis.

Doing these certifications for ISO 26262 also helps them with other standards. There may be subtle differences, but much of the work can satisfy the needs of more than just one standard. For instance, their TÜV certification and resulting FPGA qualification kit cover ISO 26262, IEC 61508, and EN 50128. They also announced certification for DO-254.

While all EDA companies have this challenge ahead of (or behind) them, companies like OneSpin, which have tools at the end of the certification chain, bear the greatest burden of proof, since they’re proving everyone else. So you’ll probably see announcements of a similar nature for all of them.

More info:

OneSpin