Safety-Critical Systems

Security, particularly for the IoT, has been hogging the headlines recently, but safety and safety-critical applications are still a major topic of interest. February saw the 25^th Safety-Critical Systems Seminar (SSS). It was organised by a group called the Safety-Critical Systems Club (SCSC) – the UK’s professional network and community for sharing knowledge about safety-critical systems, with membership from a wide range of disciplines including practicing engineers, academics, and regulators, as well as tools suppliers. The seminar also coincided with a change in the Club’s management team, and a relocation from Newcastle to York. The event was used as an opportunity to look back on the last 25 years in developing safety-critical applications and reviewing what needs to be done. The three-day event was studded with keynotes reviewing the last 25 years, and often even longer.

During the 1970s and 1980s, a discipline of safety engineering was emerging from a backdrop of spectacular accidents involving nuclear power stations, chemical engineering works, railways, and aircraft. Governments passed laws on health and safety, and organisations such as Occupational Safety and Health Administration (OSHA) in the US and Health and Safety Executive (HSE) in the UK were established. The systems that they were initially managing were mechanical and electro-mechanical, with relays opening valves, managing electric pumps or hydraulic controls, and with human controllers. By the late 1980s electronics was beginning to play an increasing role, and with it came the use of software.

Safety engineers had already begun to establish methodologies, guidelines and standards for safety-critical systems and quickly began work on producing the same for the new systems. In the US, work began on what became DO-178: Software Considerations in Airborne Systems and Equipment Certification (published 1992), and, elsewhere, there was the early work on IEC61508: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems (published 1998). These are generally regarded as the key standards for defining a process for developing safety-critical systems. At a very high level and with an over-simplistic view, they are similar in that they require a safety/risk analysis of different parts of the system and a classification of these parts into different levels, defined by the result of a failure – ranging from trivial up to catastrophic and fatal. They then define ways to develop systems to meet the appropriate level.

Leading on from these are other safety standards for specific application areas – for example, ISO 26262 (for vehicles) and guidelines for writing code, such as MISRA C and CERT-C. Alongside them has grown a whole industry of tools to develop and test these system and certification authorities to validate and approve them. The founding standards have been through revisions, and there is a steady stream of new standards.

However, while there was a general air at the SSS that this was a good thing, many speakers identified serious unresolved issues and potentially even more serious issues.

Rather than look at how individual speakers addressed the issues, I want to pull together the various views into some specific threads. It must be made clear that these are not my thoughts, but a report of what other people, most of whom have many more than 25 years experience in working on safety-critical systems, said at a meeting of safety specialists, including active practitioners, academics, and tool suppliers.

The first area to look at is the standards. There was widespread concern that the existing standards, while better than nothing, are a long way from perfect. One speaker said that, “most standards are unverified hypothesis,” and another stated that while standards are often the consensus of a large group of experts, they are not evidence-based. There was also concern over how long standards take to be produced; how, particularly for international standards, there is a large number of stakeholders from different countries that have to be involved; how there can be a political/commercial element in the creation of standards – reflecting, for example, interests of the employers of those who are involved in drafting the standards; and how there is no post-publication evaluation – for example, no one has looked at the defect rates for the different safety levels in DO-178B. Often, it was argued, standards are updated only after something has happened that demonstrates a failing in the standard. The long gestation time can also mean that the standard doesn’t keep up with the developments in the industry whose needs it is designed to meet – the new (second) version of ISO 26262, currently in final draft, doesn’t have any mention of autonomous vehicles.

There was also concern about the proliferation of standards: is it necessary for each industry, even each segment of an industry, to develop its own standards, at great investment in time and effort, which might vary only in (unnecessary) detail from several others? A more general criticism is that the main standards cover only the development of a system and, in some cases, the maintenance of the software of the system but they do not cover how the system is actually used. More generally still, the standards are so technology-focused that the human dimension is forgotten.

A significant problem that has not been successfully addressed stems from the difficulty in unambiguously defining a system’s requirements. Unless these can be established, it is not possible to carry out the analysis needed to identify levels of risk. In fact, one speaker argued that the tools for assessing risk were flawed, partly because they are often based on techniques from the pre-electronic era, and, additionally, they frequently take no account of the human factor.

The growth of systems of systems, such as cars, is an area causing concern since, while each system can be determined to be in some way “safe”, once they interact, a completely new system is created, and it is not clear how this can be evaluated. (One speaker suggested that we already have this problem as the current approach looks at subsystems and not at the system as a whole.)

Systems that are developed through the use of machine learning and other artificial intelligence tools are beginning to be more than laboratory curiosities, and, again, it is not at all clear how the safety of these systems can be evaluated.

The role and the future of the safety engineer was also, sometimes indirectly, another cause for concern. While large companies in areas such as aerospace and defence are, on the whole, fully cognisant of the need for involving safety engineering from the start of a project, it is clear that, in other areas, there is less concern with safety, even to the extent of making a commercially-based decision on the cost of deep considerations of safety versus getting a product into the market. A safety engineer can face severe dilemmas when involved in these cases.

Another concern is that the focus on standards conformance, sometimes reduced to an exercise in box ticking, together with training in meeting standards, rather than broader education on the broader aspects of safety engineering, can lead to complacency: “This was developed in accordance with the standard, so it is safe.”

Safety and security are often spoken almost as one word, but speakers were conscious of the gulf between the different practitioners, even extending to the different vocabularies they employ.

A further area of concern is that while the processes and tools for developing safety-critical products are appropriate for the large scale, either a single massive investment like a chemical process plant or a nuclear power station, or a high volume, relatively high-priced product, like a car, people working on developing low-priced systems, such as devices for the Internet of Things, even where safety could be an issue, are not, even if they are aware of these processes, going to be able to make the investments that the big boys can.

One speaker looked at software metrics, arguing that we don’t have any systematic technology to understand how and why software systems fail. One possible cause can be feature-creep in the project, because where an engineer can say with total confidence, “If we try to use a beam that long, it will break,” based on detailed data, a software developer saying, “Adding that feature will probably cause unintended consequences,” can argue only from experience and has no hard data to back it up. He went on to say that, thirty years ago, it was observed that “the almost complete lack of any empirical basis to software engineering means that expert witnesses can violently disagree without perjuring themselves” – an observation that is still relevant.

By concentrating on the issues facing the developers of safety-critical systems, this round-up has been unfair to the mainstream of the seminar, which was, in the main, positive. Papers I haven’t drawn on described success in developing tools and approaches that are positive and moving forward safer systems, and the reviews of the last 25 years showed that it is our deeper understanding of the problems in creating safe systems that has revealed the issues I have discussed above. Since these issues have been identified, it is the groups like the SCSC that will be working on how they may be resolved, and future SSS meetings will be the venues where they will be revealed.

Safety-Critical Systems

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk