I had intended to write about automotive matters today, but instead my eye was caught by a link on The Risks Digest: “Software Failures Responsible for 24% Of All Medical Device Recalls.”
So I followed through to the source document, the report of the FDA’s (the United States Food and Drug Administration) Office of Science and Engineering Laboratories (OSEL). Within the OSEL is the Division of Electrical and Software Engineering (DESE), whose remit is to look at electrical and electronic technology and software. The DESE exists alongside a number of other divisions, including an Electromagnetic and Wireless lab. (The whole of OSEL seems stuffed with some really interesting work, as well as an alphabet soup of abbreviations.)
Digging into the figures was interesting. The number of medical devices recalled is a tiny fraction of the overall FDA recalls. In the listing on the site on July 4th for the previous 60 days, there were recalls for 39 food products, for 6 drugs (including, in one recall, Firminite, Extra Strength Instant Hot Rod, and Libidron – just let your mind boggle), 9 for animal feeds, and 3 for medical devices. Of the medical devices, 2 were recalled for mechanical issues, and the other was recalled for capacitor-related fires in oxygen concentrators. None, in this period, was for a software issue.
The number of medical device recalls due to software is therefore not huge. But the underlying trend is disturbing. Between 1992 and 1998, it was less than 10%. Since then, the overall trend has been upwards, with a dip in 2009 and 2010, followed by a leap in 2011. This is, of course, partly the result of software playing an increasing role in products. But it is also an indictment on the way in which software is developed.
These recalls are costly. In April 2011, Moog Medical recalled a whole range of ambulatory infusion pumps. These are small devices that deliver doses of a drug. To quote the FDA press release:
“… the device recall is due to a software anomaly which leads to software Error Code 45 (EC45), resulting in a shutdown of the pump. This failure may result in a delay or interruption of therapy, which could result in serious injury and/or death.”
The release went on to say:
“’Moog Medical is committed to the highest level of quality in our products,’ said Martin Berardi, President of Moog Medical Devices Group. ‘Our goal is to maximize patient safety and minimize the impact of this field action on our customers.’ In the first quarter of this year (2011), the Company took a reserve of $1 million to cover the cost of this recall.”
$1 million is a lot of folding money: the division had a 2011 turnover of only $142 million and barely broke even after years of losses, and it is not clear from the company’s 10K (financial report) exactly how much the final cost was. The more glossy annual report didn’t put a figure on it but said that the recall also had an impact on sales of pumps. In the context of Moog as a whole, which has a turnover of $2.4 billion, $1 million is, of course, trivial.
Poor Moog Medical wasn’t out of the woods: in May 2012 they had to recall the same pumps, and others, because there was the possibility for the pump to run backwards. Instead of pushing drugs into the patient, the pump had the potential to pull blood from the patient. In this case there were 544,900 suspect sets “sold and distributed in the U.S. between December 2011 and May 2012.”
More widely known, and ultimately more damaging, was Cardiac Science Corporation’s problems with defibrillators (used for resuscitation of heart attack and other trauma victims). The 2009 release announcing the recall talked of issues with resistors, but announced that a software fix would be available. It then recalled 24,000 defibrillators, at a cost of $18.5 million. Customers and shareholders lost confidence, and, in 2010, the company was sold.
Within the OSEL report is a case study where a software engineer from DESE went to help identify problems with quality monitoring by a device manufacturer. The report says,
“In medical devices that contain software, it can be extremely difficult to assess if a firm follows their processes for design controls, especially in the areas of validation, risk/hazard analysis, and design changes.”
The software engineer identified customer issues with equipment not reporting test results that were out of range. This
“… can directly lead to patient harm or death if inappropriate drug dosing (too little or too much) or clinical decisions are made based on incorrect information.”
There were 13 open complaints, of which 10 had not been correctly assessed for an average of 287 days (that is over nine months). She
“… pinpointed critical software files and identified several coding defects which directly caused many of these customer complaints. Some defects were basic violations of software coding practices, while others were new defects that were introduced during the correction of previous defects.”
Once you start digging, you find other horror stories: more issues with poor software; systems deliberately made accessible, over the internet say, for software upgrades, but which have non-existent security, laying pacemakers or insulin pumps open to a virus attack; web sites that provide downloads for software upgrades that are open to virus infections, and more. A few minutes of Googling will throw up plenty of examples.
It would be very interesting to see whether, in the companies that are having issues with software, the software development process is governed by procedures, such as those laid down in IEC 62304:2006, Medical device software — Software life cycle processes. Or how far the procedures for risk assessment that this and other standards require have been actively followed or just followed with lip service.
It would be interesting to know whether the software team thinks of itself as an engineering team or a group of creative artists. And whether management invests equivalent amounts of money in tools for the software development as they do for hardware development.
The FDA has not sat back and criticised: it has actively encouraged better practice. For example, since 2009, it has carried out significant work on software for controlling infusion pumps, and this is available to manufacturers. The FDA has also been active in promoting the use of static code analysis. Almost every provider of tools that include static code analysis routinely discusses the FDA and its activities in their presentations.
Yet people go on producing poor software. We are all used to poor software in consumer devices and in PCs. We shouldn’t be, but we are. However, poor software that imperils life is not something that anyone should accept. Management have got to take responsibility and believe in the risk analysis and other techniques that are needed under IEC 62304. They have to give the software developers tools and resources that make it easier to develop good software, that enforces standards, and that analyses code for correctness. They should even give them two screens if they want them!
And the software teams have to take their share of the responsibility. They have to recognise that they are not primarily creative artists, but are engineers. Engineering need not be uncreative – the best engineering is deeply creative and produces results that are stunning in their own right: think of a steam engine in full flight, or the stunning bridge in Millau, southern France.
While engineers in other disciplines are open minded, using an array of tools, software developers are conservative and stick to using only the tools they know.
The disciplined processes of developing software, using techniques like the V method or the waterfall method, are well understood and accepted, yet projects go ahead without any procedures in place. And each new methodology, such as Agile programming, is introduced, not as an alternative tool, but as a religion with a manifesto, squabbling sects, and accusations of heresy. Yet, with appropriate controls, an Agile-like approach can help resolve the issues surrounding evolving requirements, even in safety-critical environments like medical device development.
Formal methods can create demonstrably safe software, yet there is a huge resistance to adopting them in day–to-day development.
Building models is an accepted part of almost any engineering project. For a long time, model-driven development with automatic code generation has produced good quality code faster than has traditional manual development. This is also a way for efficient support of product variants and to cope easily with future changes. Yet the resistance to using modelling – not specific modelling tools, but modelling in general – shows no sign of diminishing, even though standards like ISO 26262 now accept that modelling is an appropriate method for producing software.
The same conservatism extends to programming languages: there is a wide range of languages as there is a wide range of problems that need solving, but programmers cling to the one language that they have learned at university. Ada and SPARK produce demonstrably better results for safety-critical systems; yet C, with all its imperfections, holds sway. Subsets of C, like MISRA C, will reduce the number of potential issues, yet the adoption of MISRA C has been painfully slow.
Static code analysis, particularly when coupled with code reviews, can speed up software development, dramatically reduce bugs, and produce software that meets the requirements. (A code review can concentrate on seeing whether the software meets the requirements after a static code analysis has removed issues of detailed implementation. Yet the use of static code analysis and code reviews is increasing very slowly.)
Several tools that have started as static code analysis can also provide management with detailed information on trends in software quality and the differences in quality between different programmers — vital information, one would have thought, for effective management, but take-up is even slower than that of the tools themselves.
Martyn Thomas has for years been pointing out that, while surgeons may bury their mistakes, engineers share them. This is both to seek answers to the problems and to help others avoid making the same mistake. While not all engineers are perfect, in no other engineering discipline are there the same religious wars as we see over programming languages, over development methodologies, over using tools, and over a range of other issues.
Now, this rant may seem unfair to those software people who are developing software in a controlled environment using the arrays of tools that are available. But these people are in the minority. Sure — as I said earlier, management needs to change its attitudes. But software guys really need to break out of their self-absorbed rut and actively embrace the new freedoms to be truly creative that software engineering approaches and tools can bring. Otherwise we will see software killing people.