Lessons from Fukushima

In August a group of experts on risk, safety engineering, and related matters looked at the Fukushima Daiichi nuclear power station disaster to see what broader lessons could be learned. Before we start reviewing some of the broader topics that arose in the workshop, please look at the exam paper below.

Examination: Safety and Systems

Your exam starts now. Please answer all the questions in this paper in a way that will satisfy any party with any interest, legitimate or otherwise, and that, in ten years, twenty years or fifty years, will not leave room for you to be blamed if your answer subsequently proves to be wrong. Remember, people’s lives and property will depend on your answer.

Section 1 – Designing a nuclear installation

1) Where do you put a multi-reactor nuclear generating plant

a) Close to a sea known to have a significant tsunami risk?
b) In a known earthquake zone?
c) Both of the above?

2) Where do you put the reactors?

a) In a nice straight line and close together?
b) In an un-aesthetic scatter, but where an explosion in one won’t necessarily create a domino effect?

3) Where do you put the standby generators

a) Under the reactor cores?
b) On top of the reactor cores?
c) What standby generators? Doesn’t the plant generate electricity? And isn’t it hooked into the national grid?

4) If you decide to put a wall around the plant, do you design it so that it is high enough to cope with

a) A five year tsunami?
b) A hundred year tsunami?
c) A thousand year tsunami?

Sub question.

Will a one in a thousand year incident happen:

In a thousand years time?

Since the last one happened 500 years ago, the next will be in 500 years time?

Possibly next year?

5) What are the procedures you follow to ensure that you are making the right decisions in specifying and designing the power station?

Section 2: Controlling a disaster

1) You are the senior officer in charge of trying to contain the consequences of a nuclear catastrophe. In an effort to keep the reactor cores from meltdown, you have resorted to spraying seawater directly onto the containment chambers. Your country’s government explicitly forbids you from continuing to do so. Do you obey? How do you justify your actions?

2) You are the spokesman for a nuclear generating company. You are about to meet the world’s press after a potentially catastrophic incident. Do you

a) Admit there is a major problem, that you don’t know the full extent and that you will provide regular updates as things become clearer?
b) Wrap up your announcement in understatement, comfort words, or litotes?

Examinees note: be aware that your answer to this question, and most of the others, is going to be very specific to the culture in which you have grown up and work. You may not be able to reconcile that with pleasing/informing commentators from other cultures.

Section 3: Avoiding a future disaster

After the incident, you are in charge of investigating what happened. Do you:

1) Try to find, through examining the chain of events, the specific cause of this incident and re-write the rules to stop the same thing happening again?

2) Or do you, in an open forum, look at the broader environment, sociological and political as well as technical and then, in dialogue with experts, try to identify what changes are needed to avoid a similar event happening? These may be sociological or political as well as organisational and technical.

Section 4: Regulatory agencies:

1) Do you have separate agencies for promoting nuclear energy and monitoring and regulating the nuclear industry?

2) Do you encourage a free flow of personnel between the nuclear industry, government, and the regulatory agency, to maintain a flow of information and experience and to ease interworking? Or do you keep all three elements at arms length from each other in order to maintain complete independence of the regulator and the industry it is regulating?

Please file your exam papers away carefully and revisit them in ten years time.

What follows is a personal interpretation of two very full days and evenings of presentations and discussion. It should not be taken as implying there was or was not a consensus view on any matter discussed, nor indeed does it accurately represent any single participant’s views except my own, and, even then, my views may change.

What came through to me, and this was close to a consensus, was a general concern that any investigation into the Fukushima events will focus on identifying a cause or causes and how these can be alleviated in future projects – reflecting the generally accepted view that safety engineering is all about technology. The workshop felt that, particularly with the large and complex issues of a nuclear power station, there are much larger societal and political issues that need to be examined. Without identifying and addressing these, finding causes and alleviating them for the future was almost a whack-a-mole approach: the next significant event will have a new set of causes. Understanding the political and societal issues could lead to a better approach toward safety issues.

The normal cycle for developing a system regarded as potentially hazardous is to identify possible hazards, analyse the severity of a hazard and its likelihood (risk analysis), and assess how acceptable these risks are, perhaps through iterations, and possibly write a safety case. Within different industries, elements of these assessments are defined by a regulatory regime. The result of this analysis is part of the requirements specification used to develop the system. From this point onwards, you start using procedures and following standards set for your industry.

While there can be, and is, debate about how appropriate standards such as ISO 26262 or 803-B are in producing good quality systems, and even more detailed arguments about the effectiveness of different approaches to software, these are debates about building the walls. The hazard and risk analysis have already defined the foundations.

Even on a purely objective level, hazard and risk analyses are not simple, but can they really be objective? Particularly when you are working in a regulated environment? Take the common phrase in British safety – ALARP – as low as is reasonably practicable. To say that the risk is ALARP means that it is demonstrable that reducing the risk will cost a great deal more than the benefits it would bring. This has become a legal concept, since a court case in 1949, but it still requires judgement – the relevant government web site says, “there are many assumptions and uncertainties involved.” Other countries have very different approaches to the same issue, usually involving written standards.

So our objective analysis of risks is already going to be strongly influenced by external societal factors, such as the legal system within a country. The political environment is another such factor: Germany used Fukushima to resume a programme to shut down all German nuclear power plants by 2022. This despite there being no exposure to tsunamis, very limited earthquake exposure, and generally a different kind of reactor is used in Germany compared to that in Fukushima.

Within an organisation there will be a culture, which determines approach to risk: even company politics may play a role.

These factors also influence how to cope with warnings or whistle-blowing. What happens when a project is in progress and a staff member realises that there are issues? (This is not a new issue – try reading Neville Shute’s No Highway. Written sixty years ago, it is about an engineer who realises that the plane now in service will suffer metal fatigue and crash – before that actually happened with the de Havilland Comet (the first jet liner). Perhaps dated writing and manners, but the corporate reaction is completely contemporary.)

Within the development phase, following the procedures laid down by the standard is the path of least resistance, as well as being necessary to meet the certification for compliance that is needed to sell into many markets. But, again, it is a question of corporate culture, whether that is all that is done, or whether the developers think and question what they are doing and examine whether the procedures and standards alone are enough to provide the levels of safety needed.

And the same thing happens when the system, whatever it is, is in use. Operator error or pilot error cannot be the only reason for system failure. Recent discussions about the Air France 2009 crash have looked at causes of pilot error, whether by insufficient training to address a specific type of incident, or by losing learned experience through increased reliance on automation. Depending on your standpoint, either or both of these causes may be irrelevant. The corporate and societal environment in the system, of which the operator is a part, has a significant role to play. At its most mundane, the operator taught to blindly follow procedures is the equivalent to a driver blindly following the instructions from a GPS system, without reading the road signs.

There were many other detailed threads within the workshop, and some of these will be looked at later this year. However, the impression I was left with was that merely creating new rules for procedures, from initial assessment to day-to-day operation, and not looking at the broader issues, is only going to prevent an identical set of failures. Looking at the wider context may make it possible to prevent entire classes of failure.

A selection of the presentations given at the Workshop is at: http://www.rvs.uni-bielefeld.de/Bieleschweig/eleventh/ The organiser of the Workshop, Peter Bernard Ladkin, is preparing a printed publication.

Very interesting and glad to learn that some people are starting to take a different look. Safety engineering, while in general fairly successful, still suffers from a few systemic issues:
1. Why is safety a separate chapter in the book of systems engineering? Safety is only one of the 4 properties that a trustworthy system must meet (safety, security, usability, privacy). Good systems engineering must almost aim at developing trustworthy systems, hence no reason to put this in a separate chapter.
2. There is a issue with the thinking about risks as probabilities. MTBF is a concept coming from production quality control. But as Fukushima has unfortunately demonstrated again, the law of Murphy is real. Improbable events can happen the next second and when they do, they often happen in bursts.
3. Driven by the desire to reduce cost and liabilities, often the minimum SIL level is aimed for. If a safety risk is real, the default should be to design for SIL4 (fault tolerant) so that the fail-safe mode is a SIL3 mode (still full functionality, but red lights warnings). A reduced functionality mode is not very certain to really be fail-safe in real circumstances. Think about the high idle (1000 rpm limit on engines) fail safe mode of cars when e.g. the gas pedal positioning sensor detects a fault. How safe is that when parking? How safe is that when driving 200 km/hr on the Autobahn?
Note, often I then get the reply that SIL4 is way too expensive. This is not true is the initial architecture was designed for it. Applying it later on, that is very expensive and having to redesign, even more.
4. Why are safety standards not public domain? To obtain one, one has to pay and there are severe restrictions in even printing and reading the document. If society would really be concerned with safety (TRUST), every engineering student should get free copies. Or better take a community not committee driven approach, like Wikipedia, and we get readable and understandable standards.
The good news is that I hear more and more voices of a growing awareness that we can do better. The way to go is to formalise the whole domain not with the intent to make it more complicated but cleaner. Systems Engineering can be described with only 15 concepts (meta-level). The rest is creating domain specific instances and refinement. They key to Trustworthy is a clean architecture and understanding the problem before one starts the design.
Eric Verhulst, Altreonic

2 thoughts on “Lessons from Fukushima”

Dick Selwood says:

September 24, 2011 at 4:28 am

Would you pass the exam to plan/run a nuclear power station?

Log in to Reply
ericverhulst says:

September 24, 2011 at 7:01 am

Very interesting and glad to learn that some people are starting to take a different look. Safety engineering, while in general fairly successful, still suffers from a few systemic issues:
1. Why is safety a separate chapter in the book of systems engineering? Safety is only one of the 4 properties that a trustworthy system must meet (safety, security, usability, privacy). Good systems engineering must almost aim at developing trustworthy systems, hence no reason to put this in a separate chapter.
2. There is a issue with the thinking about risks as probabilities. MTBF is a concept coming from production quality control. But as Fukushima has unfortunately demonstrated again, the law of Murphy is real. Improbable events can happen the next second and when they do, they often happen in bursts.
3. Driven by the desire to reduce cost and liabilities, often the minimum SIL level is aimed for. If a safety risk is real, the default should be to design for SIL4 (fault tolerant) so that the fail-safe mode is a SIL3 mode (still full functionality, but red lights warnings). A reduced functionality mode is not very certain to really be fail-safe in real circumstances. Think about the high idle (1000 rpm limit on engines) fail safe mode of cars when e.g. the gas pedal positioning sensor detects a fault. How safe is that when parking? How safe is that when driving 200 km/hr on the Autobahn?
Note, often I then get the reply that SIL4 is way too expensive. This is not true is the initial architecture was designed for it. Applying it later on, that is very expensive and having to redesign, even more.
4. Why are safety standards not public domain? To obtain one, one has to pay and there are severe restrictions in even printing and reading the document. If society would really be concerned with safety (TRUST), every engineering student should get free copies. Or better take a community not committee driven approach, like Wikipedia, and we get readable and understandable standards.
The good news is that I hear more and more voices of a growing awareness that we can do better. The way to go is to formalise the whole domain not with the intent to make it more complicated but cleaner. Systems Engineering can be described with only 15 concepts (meta-level). The rest is creating domain specific instances and refinement. They key to Trustworthy is a clean architecture and understanding the problem before one starts the design.
Eric Verhulst, Altreonic

Log in to Reply

Lessons from Fukushima

Related

2 thoughts on “Lessons from Fukushima”

Leave a Reply Cancel reply

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

featured chalk talk