Most news groups seem to have the same cast of characters: there is the long-winded one (or more than one), the aggressive one, the flippant one, several intense people, the obsessive one, the know-it-all, and so on. The York University Safety Critical Mailing List (http://www.cs.york.ac.uk/hise/sc_list.php) has its share of these types, of course, but it also has a good number of thoughtful people who take a lot of time to try to come to a reasonable consensus on the topics under discussion.
What to the eavesdropping outsider is rather scary is that they cannot reach consensus on some pretty fundamental things. And they are talking about the safety of systems such as aircraft, trains, and nuclear power stations that are part of the everyday life of many people and where a safety failure will kill people.
To the newcomer to the field, it appears as though there are standards that aim to define safety, such as IEC 61508. In these standards they define process and concepts, such as safety integrity levels, SILS. Decide how safe you want to be and follow the processes, and you will be able to say to a customer, “This system is rated as SIL4 so you can use it in the confidence that it is very safe.”
But it isn’t that simple. In fact there are times when the only agreement in a thread is that safety isn’t that simple. To look at just one recent thread, which started with the question, “Does anyone know any real-life examples of … systems with a safety function that is claimed to be SIL4 in terms of IEC 61508?” Now parts of 61508 have been around for about ten years, so you would think that there would be some simple replies. But no. There were suggestions put forward of systems where the safety function was SIL4, but these were later argued to be SIL4 under CENELEC (EN 50126, 50128, 50129) rather than 61508, and that CENELEC SIL4 was equal to 61508 SIL3. And does that make you feel any safer?
One poster pointed out (I am naming no names) that a system claiming to be 61508 SIL4 will fail dangerously no more than once in a million operational hours, or approximately ten thousand years. The poster went on to say,
“I can see that it may be possible to claim that a system has been developed to the requirements laid down, by a process-based standard, for SIL 4 systems. Given the things said recently, by contributors to this list, about the arbitrariness of such requirements and the uncertainty of their results, no professional could believe that they could lead to an honest claim that the system is of SIL 4 – but at least the claim can be made that the standard was respected.
But it is not clear (to me) how an assertion, made at the completion of development of a system, that the system will not fail dangerously during the next ten thousand years of continuous operation can be made or believed by a professional engineer.”
Interesting thought that!
Another thread started with the question, “Are safety engineers people who were not good enough for traditional design/technical engineering?” The consensus was that it depends on the environment. One poster, who has a Masters in safety-critical systems engineering from York University, described a company where the chief engineer summarised the safety programme as “[They are] good engineers and they think about safety.”
An organisation that is reported as having conflicting ideas about safety is NASA. It was suggested that in some parts of NASA, safety engineering is seen as a career-ending choice, yet, in one centre, management transformed the safety group into the place where the best engineers wanted to work.
This seems related to whether an organisation, any organisation, regards safety as something that is designed-in, with the safety engineers being an integral part of the design team, or something that is somehow assessed after design is complete. In some industries, such as railways or the nuclear industry, a safety case is needed before a system can enter service. (A definition of a safety case is that it is a structured argument, supported by a body of evidence, which provides a compelling, comprehensible and valid case that a system is safe for a given application in a given operating environment.) The safety case may be developed alongside the system definition, or it may be developed almost as an afterthought. The problems of developing a safety case as a box ticking exercise were thoroughly exposed in the recently published Haddon-Cave enquiry into the in-flight explosion of a British Nimrod airborne reconnaissance aeroplane. (http://www.nimrod-review.org.uk/documents.htm) (The report also shows the issues of trying to achieve safety in an organisational structure where other things, particularly cost control, are given higher priority.)
The use of the word “argument” triggered an interesting debate on what is an argument, and then what is logic. The earliest exponents of arguments were the ancient Greeks, and Aristotle was cited as an authority on Rhetoric, seen as deploying a number of skills as a way of persuading others to see one’s point of view. This was attacked as being outmoded and irrelevant, and other philosophers were invoked. (Incidentally, did you know that Wittgenstein used truth tables?)
Other things that have recently been discussed are whether you can create safe systems using imperfect languages, like C, and whether the tools used in implementing systems, such as compilers, can themselves be legitimately assessed as meeting a particular SIL level.
There are occasional outbursts of temper on the site, but I haven’t found deliberate anti-social behaviour. And sometimes misunderstandings or disputes arise because of different people’s understanding of words. (One contributor explained that sometimes his postings appeared aggressive because he was Dutch and they say things bluntly in the Netherlands.)
But on the whole, the list is a group of people who are seriously committed to working towards safer systems. Some of the postings run to several thousand well-reasoned words, quite a lot of work even for a high speed keyboarder. And this is why I find it fascinating and also worrying.
There is a general feeling that many embedded systems are poorly built and buggy, particularly systems aimed at high volume sales over short life cycles. But somehow I would like to think that the systems that my life, or other people’s lives depend on, which have very long development cycles and which have very long lives in service, are going to be developed by teams of people for whom safety is a passion (as one contributor described their attitude to safety), not just as an add-on. These systems should be developed by companies where the culture is safety oriented and where standards are regarded, not as a cookbook to be blindly followed, but as an indication of the starting point or a base line.
But that the people contributing to the York list, who include the leaders in thinking about and developing safety-critical systems, cannot reach consensus on many of these issues is a source of worry. And if they can’t reach consensus on these matters, what hope is there for the rest of us?