Measuring Safety, an FPGA at a Time

One of the problems of being an engineer is the compulsion to classify, count and measure things. Not just for professional purposes, but the sort of person who becomes an engineer seems also to be the sort of person who automatically counts, classifies and lists the things they encounter in everyday life. But some things are difficult to measure. Take love – Shakespeare’s Mark Anthony proclaims that, “There’s beggary in love that can be reckoned.” (Although, perhaps, Elizabeth Barrett Browning displays an engineering streak when she asks, “How do I love thee? Let me count the ways.”)

Safety is another quality that does not lend itself to easy metrics, but it is something that has to be calibrated. We want our cars, our aeroplanes, our homes to be somehow “safe”, and the outcry after we feel we have been let down, as we can see today with the Toyota affair, is ample evidence that this is a general feeling. But the engineers who have to achieve that safety are faced with a vast range of conflicting pressures. If you were building a brand new transportation system, starting from a totally clean sheet of paper, is it socially acceptable to recognise that there is no such thing as absolute safety? At what point do you say, “It will cost $X million to make this safer and it will save one life a year”? This requires, in effect, placing a value on human lives, something transportation engineers have been grappling with for years. A particularly straightforward example is where a road and a railway cross. If you put one under the other it is significantly safer than having a level crossing (grade crossing), but also significantly more expensive. Decisions are made during construction, and then, years later, the traffic changes. Instead of a horse and cart crossing once a day or so, you have a stream of hundreds of cars an hour. Who pays for the construction of a safer bridge? The railway company or the owner of the highway? And what happens when the railway company decides to increase the speed and frequency of the trains? Does this change the argument over who should pay for a safer crossing?

These are relatively simple scenarios, but the real-world constraints of politics and budgets mean that frequently the issues are very difficult to resolve.

When you get to systems that have multiple elements — some mechanical, some electrical and probably with electronics controls, which can injure or kill people when they do not operate as intended, then you get to areas where engineering skills should be predominant.

For many years now the basis for developing such safety-critical systems has been IEC 61508 – Functional safety of electrical/electronic/programmable electronic safety-related systems. The IEC – the International Electrotechnical Commission – is, to quote their website:

the leading global organization that prepares and publishes international standards for all electrical, electronic and related technologies. These serve as a basis for national standardization and as references when drafting international tenders and contracts.

Without going into excruciating detail (but if you can’t cope with excruciating detail, you should not be getting anywhere near safety-critical things), the standard defines Safety as …freedom from unacceptable risk of physical injury or of damage to the health of people, either directly, or indirectly as a result of damage to property or to the environment. It then defines Functional Safety as … part of the overall safety that depends on a system or equipment operating correctly in response to its inputs.

A 61508 development flow starts with a safety requirements analysis, with the requirement defined from the overall scope of the product and from hazard and risk analysis, through product development to testing and verification, usually thought of in the shape of a V model.

Annex F of 61508 covers the design flow for ASICs, which also applies to FPGAs, as a more specific V model, developing from specification, through architecture; behavioural modelling; module design; synthesis, place and route; to code. Then back up the V through post layout simulation; module test; module integration test; system test and finally validation test – to ensure that the chip is an accurate implementation of the design that was originally specified.

Inherent in the 61508 process is the concept of SILs (Safety Integrity Levels). Elements within the system, and the system as a whole, are measured at a level of failure probability (probability of a failure that introduces “danger” occurring per hour of operations) for a design, categorized as:

SIL 4: > 10⁹up to <10⁸ (1 failure in a minimum of 110,000 years)

SIL 3: > 10⁸up to <10⁷ (1 failure in a minimum of 11,000 years)

SIL 2: > 10⁷up to <10⁶ (1 failure in a minimum of 1,100 years)

SIL 1: > 10⁶up to <10⁵ (1 failure in a minimum of 110 years)

As well as 61508 there are a wide range of other standards, many derived from 61508 and tailored to meet the needs of specific industries. ISO 26262, a new standard for electrical and electronics systems in small road vehicles including cars, for example, is due to be published soon.

You should be aware that in the safety-critical design community there is a huge debate running continuously on what all this means. What can be certified? Will simply following the flow produce safe systems? There is a strong view that safety has to be designed in, not tested in, and that safe systems will come only from organisations that think through a safety perspective rather than those that just tick the boxes. There is also an argument that there is a need for formal methods to be applied to evaluate safety-critical systems, but there is not any agreement on whether the formal methods tools are usable by anyone who does not have a doctorate in mathematics.

Despite the philosophical discussions, IEC 61508 is a reality, and it is now the basis for the European Machinery Directive (2006/42/EG), which was due to cover all machines shipping into manufacturing plants in Europe after December 2009. Although the date has slipped to 2011, in part because of the long lead time in certifying the equipment, no machinery builder hoping to serve Europe can ignore it. And Japan and the US are working on similar regulation.

For a machine to enter use it has to be certified as complying with the requirements of the directive, which requires validating the components, the software, and the development tools used. Since a machine is normally itself a subsystem within a manufacturing system, and also made up of sub-systems, this process can be complex and time consuming: TÜV Rheinland, one of the leading German organisations in this field, estimates that validation can add anything up to two years, or even more, to the development cycle.

FPGAs, which are increasingly finding uses in the industrial field, are themselves quite difficult to certify, since the silicon, the development flow and any IP used by the FPGA all need to be certified. Altera have teamed up with TÜV Rheinland to make life easier.

Altera are positioning the Cyclone family as a way of reducing BoM costs, by replacing DSPs, microprocessors and ASSPs with a single device. While this, on its own, would simplify certification, Altera has worked with TÜV Rheinland to gain certification for the FPGAs and the Quartus standard development tool flow for both VHDL and C. These are packaged with safety manuals and with a family of safety IP cores. The aim is that the developer can, by using the Altera package, create an FPGA-based control and/or communication application according to the Annex E methodology. The tools, libraries and IP are all certified by TÜV Rheinland as approved for use in systems that are rated as up to SIL 3. This, you will remember from earlier, is equivalent to one failure in a minimum of 11,000 years.

I don’t think that Altera or TÜV Rheinland expect that an experienced FPGA engineer will, just by using the safety pack, automatically be competent to create safety-critical applications. And, as with all such claims, two years being lopped off the development and certification cycle is going to be hedged around with “your mileage may vary” caveats. But this announcement, unlike the usual round of, “My process is smaller than yours”, “My device has more gates, more IO, more magic formulae than yours”, or “Yah nah nah nana nah” announcements, looks like a real attempt to invest in creating tools and methodologies that will help the user get products developed faster in a complex marketplace.