feature article
Subscribe Now

Cars, Coding, and Carelessness

Sloppy Coding Practices Led to a Fatal Crash

You’ve probably heard by now about the lawsuit against Toyota regarding its electronic engine control. The jury found the automaker guilty of writing fatally sloppy code, and, based on what the software forensics experts found, I’d have to agree.

This case is fundamentally different from the “unintended acceleration” fiasco that embroiled a certain German carmaker back in 1986. That scare was entirely bogus and made-up, and it was fueled by an ill-considered “60 Minutes” exposé that aired in the days when Americans watched only three TV channels. Sales of the affected cars plummeted, and it took more than two decades for the company to recover. An engineering spokesman for the carmaker told reporters, “I’m not saying that we can’t find the problem with the cars. I’m saying there is no problem with the cars.” He was dead right – there was no problem with the cars – but the remark was viewed as arrogant hubris, and it just made the situation worse.

In reality, a few drivers had simply been pressing the wrong pedal, which is a surprisingly common mistake. It happens all the time, in all types of cars. Naturally, nobody wants to admit that they just ran over the family cat (or worse, their own child) through momentary stupidity, so they blame the equipment. “I didn’t run over Fluffy. The damn car did it!”

Back then, throttle controls were mechanical. There was a direct mechanical connection (usually a cable) from the gas pedal to the carburetors or fuel-injection system of the car. Unless gremlins got under the hood (no AMC jokes, please), there wasn’t much chance of that system going wrong.

Now cars’ throttles are mostly electronic, not mechanical, and the “drive by wire” system has come under new scrutiny. Unlike a basic steel cable, there are a whole lot of things that can go wrong between the sensor under the gas pedal and the actuator in the fuel injector. Any number of microcontrollers get their grubby mitts on that signal, or the connection itself could go bad. It’s just an embedded real-time system, after all, with all the pros and cons that that implies.

Reset to today. After a years-long legal battle involving an Oklahoma driver whose passenger was killed when their car suddenly accelerated when it wasn’t supposed to, the courts ruled in favor of the plaintiff. In other words, the car was defective and its maker, Toyota, was found to be liable.

There was no smoking gun in this case; no dramatically buggy subroutine that caused the fatal crash. Instead, there’s only supposition. But what a careful examination of the car’s firmware showed is that it could have failed in the way described in the case, not necessarily that it did fail. That was enough to convince the jury and penalize the carmaker at least $3 million.

For embedded programmers, the case was both enlightening and cautionary. For years, experts pored over Toyota’s firmware, and what they found was not comforting. Legal cases often bring out dirty laundry, the things we casually accept every day but would rather leave covered or private. In a liability case, privacy is not an option. Every single bit (literally) of Toyota’s code was scrutinized, along with the team’s programming practices. And the final conclusion was: they got sloppy.

It’s not that Toyota’s code was bad, necessarily. It just wasn’t very good. The software team repeatedly hacked their way around safety standards and ignored their own in-house rules. Yes, there were bugs – there will always be bugs. But is that okay in a safety-critical device? It’s nice for novices to say that there should never be bugs in such an important system; that we should never ship a product like a car or a pacemaker until it’s proven to be 100% bug-free. But, in reality, that means the product will never ship. Is that really what we want? If it’s going to be my car or pacemaker, yes. If it’s the car or pacemaker I’m designing… maybe that’s too high a bar. But there is some minimum level of quality and reliability that we as customers have a right to expect.

Toyota’s developers used MISRA-C and the OSEK operating system, both good choices for a safety-critical real-time system. But then they ignored, sidestepped, or circumvented many of the very safety features they are designed to enforce. For example, MISRA-C has 93 mandatory coding rules and 34 suggested rules; Toyota observed only 11 of those rules, and still violated five of them.  Oh, and they ignored error codes thrown by the operating system. You can’t trust a smoke alarm if you remove the battery every time it beeps.

Stack overflows got close scrutiny, because they’re the cause of many a malfunctioning system. Contrary to the developers’ claims that less than half of the allocated stack space was being used, the code analysis showed it was closer to 94%. That’s not a grievous failure in and of itself, but the developers wrote recursive code in direct violation of MISRA-C rules, and recursion, of course, eats stack space. To make matters worse, the Renesas V850 microcontroller they used has no MMU, and thus no hardware mechanism to trap or contain stack overflows. 

OSEK is common in automotive systems, almost a de facto standard. It’s portable, it’s widely available, and it’s designed to work on a variety of processors, including ones without an MMU. But because it’s a safety-critical software component, each OSEK implementation must be certified. How else can you tell a good and compliant OSEK implementation from a bad one? Toyota used a bad one. Or, at least, an uncertified one.

Structured-programming aficionados will cringe to learn that Toyota’s engine-control code had more than 11,000 global variables. Eleven thousand. Code analysis also revealed a rat’s nest of complex, untestable, and unmaintainable functions. On a cyclomatic-complexity scale, a rating of 10 is considered workable code, with 15 being the upper limit for some exceptional cases. Toyota’s code had dozens upon dozens of functions that rated higher than 50. Tellingly, the throttle-angle sensor function scored more than 100, making it completely and utterly untestable.

Although the Toyota system technically had watchdog timers, they were trivially simple fail-safes in name only. The list goes on and on, but it’s a familiar litany for anyone working in software development. We know better, we’re embarrassed by it, but we do it anyway. Right up until we get caught, and Toyota’s programmers got caught. And people died.

All the basics were there. As far as the legal and code experts could determine, the engine-control system would have worked if more of the safety, reliability, and code-quality features had been observed. And, obviously, the car does work most of the time. It’s not noticeably faulty code. And that’s the problem: it appears to work, even after millions of hours of real-world testing. But those lurking bugs are always there, allowed to creep in through cavalier attitudes about code hygiene, software rules, standards, and testing. Other, more conscientious developers did the hard work of creating MISRA-C, OSEK, and good coding practices. All we have to do is actually follow the rules. 

7 thoughts on “Cars, Coding, and Carelessness”

  1. “I think you will find it is more complex than that”
    Interestingly at least two of the accidents with Toyota involved very senior drivers and it could not be proved that they had not stamped on the wrong pedal.
    The code was sloppy but the expert witness was not able to prove that it caused the accident- just that it might have done
    Toyota appears to have taken an economic decision to pay up rather than appeal.
    Are courts the right venue for disentangling these events? I don’t think so, and tomorrow will be discussing his – so watch this space.

  2. Pingback: GVK Bioscience
  3. Pingback: DMPK Studies

Leave a Reply

featured blogs
Apr 16, 2024
In today's semiconductor era, every minute, you always look for the opportunity to enhance your skills and learning growth and want to keep up to date with the technology. This could mean you would also like to get hold of the small concepts behind the complex chip desig...
Apr 11, 2024
See how Achronix used our physical verification tools to accelerate the SoC design and verification flow, boosting chip design productivity w/ cloud-based EDA.The post Achronix Achieves 5X Faster Physical Verification for Full SoC Within Budget with Synopsys Cloud appeared ...
Mar 30, 2024
Join me on a brief stream-of-consciousness tour to see what it's like to live inside (what I laughingly call) my mind...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Advantech Industrial AI Camera: Small but Mighty
Sponsored by Mouser Electronics and Advantech
Artificial intelligence equipped camera systems can be a great addition to a variety of industrial designs. In this episode of Chalk Talk, Amelia Dalton and Ryan Chan from Advantech explore the components included in an industrial AI camera system, the benefits of Advantech’s AI ICAM-500 Industrial camera series and how you can get started using these solutions in your next industrial design. 
Aug 23, 2023
28,420 views