A Maze of Twisty Little Passages

We recently invoked the fear of slipshod software programming as we attempted to slog through the maze of safety-critical standards facing software engineers.

But guess what: programmers aren’t the only ones capable of turning out shoddy goods. Hardware engineers can, also. But, unlike the software world, the focus in the hardware world seems to be more squarely on one standard: DO-254.

DO-254 appears to have much in common (other than origin) with DO-178. So much so, in fact, that I found a DO-254 blog site with FAQs that appeared to be copied verbatim from a set of DO-178 FAQs, with a sloppy job of search-and-replace that left such odd statements as “avionics systems are comprised of both hardware and hardware.”

There’s actually quite a bit of material about DO-254 out there, so you would think it would be well understood. And I’m sure that, in many specialized quarters, it is. But as I looked through the various whitepapers and such, I was struck by the opacity and vagueness of the language. Rather than explaining what’s required by the standard, the writers often excerpt directly from the standard, further forwarding its inscrutable prose.

And this typically comes with citations of innumerable accompanying letters and documents and CRFs and Orders and ACs and… well, if you’re anything like me, you can read it five times and still not be sure of what you just read.

So I thought, let’s attempt a plain-English walk through DO-254. Fundamentally, it’s not rocket science. OK, it might apply to rocket science, but doesn’t constitute it itself. Just to be clear.

As with the software standards, the intent is that things like airplanes falling from the sky don’t happen. And it is an aero-centric standard, being mandated by the FAA in the US, referring explicitly to aircraft. But the concepts could apply to anything that could hurt or kill someone.

The bottom-line mandate is: don’t eff up. In other words, the equipment should work as advertised. Easy to say, harder to prove. Which is where the action is. It doesn’t tell you what to do, but it does tell you characteristics of how you must do things. It’s then up to you to prove to a certifying entity that you did everything right.

And doing everything right has two components. There’s the process you use and the tools you use to accomplish the process. If you’re hammering a nail, there’s the hammer and the way you swing your hand. Both of those have to be right to ensure that you’ve been successful fastening whatever it is you’re fastening. Just because the nail looks to be fully hammered in might not be enough: an incorrect or faulty nail might pull out.

And, obviously, what we’re talking about here is much more complex than a dumb ol’ nail. And, importantly, it’s much harder to prove that things are or aren’t right. So, first of all, you need to use qualified tools. If the tools you use aren’t qualified yet, then you may need to perform that step.

There are three ways to do this:

One is to use some other independent tool or entity to prove that the output of your tool is correct.
The second is through a history that proves correct results (this sounds pretty slippery to me… most Europeans don’t even credit America with having a history yet, so imagine how hard it would be to establish the history of a place-and-route tool…)
The third is to go through a qualification process where you somehow prove that the tools will work for your project.

With design automation, one of the key characteristics is repeatability. If your place-and-route or other algorithm uses a random seed such that you may get a different correct and functional result each time you run the tools, you’re going to have a problem, even though each of the various results is correct and functional. Or, at least, all the ones you’ve checked out are. And that’s part of the problem: is there a corner case that might be a problem?

If, on the other hand, there’s one and only one right answer that the tool provides each time, then you can prove the correctness of your design and move forward without worrying that something might change due to some non-deterministic tool feature. With FPGAs, for example, the bitstream has to be the same each time the design is run in order for the tools to pass muster.

The process part of DO-254 is fundamentally straightforward: say what you’re going to do and do what you said and only what you said. No more and no less. Just like with software, this means establishing requirements, implementing them, and then tracing those requirements to ensure that there’s no requirement unimplemented and that there’s no hardware that doesn’t address a requirement. In other words, no hardware equivalent of a flight simulator hidden in Excel.

A design consists of several layers of abstraction and refinement. You start first with requirements. Then you do a “conceptual design,” which would be, for example, a high-level architecture. Then you move to “detailed design,” where you, for example, would write lower-level RTL to build something that’s modeled only at the conceptual level. Finally, there’s “implementation,” which could mean synthesis down to an FPGA bitstream or down to a mask set for an IC.

At each of these stages of refinement, you have the establishment of requirements (and some new requirements will crop up as you get into detail – these are referred to as “derived requirements”), doing the design work, doing verification and validation, and tracing the requirements.

The concept of “verification” and “validation” have always been problematic for me. They have been arbitrarily given specific meanings, and that can make sense, but you could easily make an argument for reversing the definitions. In other words, there’s nothing in the words themselves that make it clear what they mean.

Officially, “Validation” refers to making sure that your requirements are correct and that they meet user needs or regulations or any other thing that has to be part of the definition of the thing you’re building. “Verification” means making sure that what you actually built meets those requirements. (I could easily confuse things here by saying that “Validation” is the act of verifying that the requirements are correct and that “Verification” means ensuring that your implementation is valid, but that would just be mean.)

Bottom line: you have to do both (whatever you call them).

Verification can, in practice, involve a variety of deterministic (non-random) methods, but there’s a bottom-up approach inherent in the concept of “elemental analysis.” This means that all elements of the design need to be correct, down to the HDL level (and it does appear that DO-254 has a particular focus on designs done using HDL). Each statement in your code is an element, so code coverage is an important first-level measure.

After you’ve finished your design and verification and built a unit, you can then test your widget – FPGA, SoC, whatever –to prove that it works, but you’re not done until you’ve actually plugged it into the sub-system and proven that it works in real life. And that sub-system won’t be ready to go until it’s plugged into the system to prove that it works in real life.

Your design life is also affected by planning: you need to include in your development plan a plan for all the certification and audits and such that you’ll need along the way. You’re not likely to be a happy camper if you wait until the last minute to figure out how you’re going to verify to that dude in the suit and dark glasses who never smiles that you “really, sir, did a really, really good job on this design; best ever, I promise.”

Now, lest you go away with the impression that there’s simply a set of things to be done for critical designs, think again. There are five design assurance levels (DALs) that are shared with DO-178:

A: Failure will cause or contribute to a catastrophic failure of the aircraft.
B: Failure will cause or contribute to a hazardous/severe failure condition.
C: Failure will cause or contribute to a major failure condition.
D: Failure will cause or contribute to a minor failure condition.
E: Failure will have no effect on the aircraft or on pilot workload.

So now you would probably expect that DO-254 would lay out explicit requirements for each of these levels. Why else would you define such levels? Wrong again. In fact, I had a hard time finding this out. Michelle Lange, Mentor’s DO-254 Program Manager, helped me out; she had been stymied at one point by the same question. To give you a flavor, I’ll quote verbatim her analysis of the document – and then work with her more real-world conclusions.

“The main DO-254 document generally speaks to DAL C requirements (although in some cases you’ll find wording like “appropriate to the design assurance level or hardware level” here and there, to allow flexibility I suppose).
Appendix B adds design assurance considerations for levels A/B.
Appendix A provides “modulation” of required data dependent on the DAL. So Appendix A is really a “magic decoder ring” of sorts that specifies the data that needs to be created/reviewed/submitted across the project lifecycle for each DAL. The data obviously comes from various processes, so in essence it’s guiding the processes that must be done at each DAL.
Now, to complicate things further, with the scoping of DO-254 to apply to “custom micro-coded components” with AC20-152, this basically said that if you’re doing DAL D/E devices, you could choose to follow DO-254 and no data would be reviewed. So in other words, this sounds like you don’t need to follow the DO-254 process at all.
Apparently that is not what the FAA really meant, so they later tried to clarify what they really wanted in Order 8110-105, stating that if the user chooses to use DO-254 for DAL D development, the data doesn’t need to be reviewed, but if they use another process, the data WILL be reviewed. (I still find this quite confusing). DAL E appears to be exempt from DO-254, as it should be since a failure doesn’t result in a safety issue.”

So… I don’t feel so bad now.

Here’s the upshot, based on Michelle’s inputs. DALs A and B are treated the same way; all of the processes we’ve discussed here, along with the audits and such, have to be done. Why they make a distinction between the two DALs if they’re treated the same is something of a mystery to me. For DAL C, you don’t have to qualify tools, trace requirements (although she says the Designated Engineering Representative, or DER, may ask for this anyway…), or have independent verification. Oddly, the requirements for DAL D are unclear and vary by auditor; she sees groups treating DAL D as if it were DAL C. Good thing there’s a distinction…

I think we can safely say that DAL E doesn’t require any rigmarole.

So there you have it. An attempt at a summary of DO-254 that’s deceptively simple enough to lure in the most stubborn hold-out. Only to find out once committed that it is, in fact, as confusing as you were originally afraid it would be. Of course, confusion creates business for the bevy of consultants that promise to help with the mysterious bits. But they must add value: they’re still in business…

More info:

DO-254 (must be purchased)