feature article
Subscribe Now

Ignore Those Pesky Bugs

Software is Complicated, But How Much of it is Useful?

“We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.” – Donald Rumsfeld

Consider the humble ladder. It’s a hardware device that’s elegant in its simplicity. Two parallel side rails, with evenly spaced rungs in between. Everything you need; nothing you don’t. Ladders can be made of wood, metal, Fiberglas, or other materials. There are long ones, short ones, portable ones, and permanent ones. Nobody really needs to be taught how to use a ladder, although there are some standard safety rules that might make your tenure at the top a bit more secure. 

Now consider your computer’s word-processing software. Or the email application on your smartphone. Not so simple, are they? In fact, there are a lot of features in most programs that seem to be superfluous. Simple, elegant, just-the-basics apps seem to be the exception. They get good reviews, but people remark on them like they’re some sort of aberration. Maybe they are. “Creeping featurism” is a term that’s almost as old as engineering itself.

But that’s not my point. We don’t need to flog the poor software galley slaves chained to their oars below decks. (That’s what managers are for.) No, the question for today is, “How much of your code does real work, as opposed to just catching errors?”

First-time programmers just starting out learn to create “hello, world!” or something similar. They get the feel for what it’s like to use a programming language to describe what they want and make it into compile-worthy code. At first, they work toward getting their early programs to obey their wishes, pure and simple. Bugs will creep in, sure, but that’s all part of the learning process.

But after the first few successful attempts, we start to learn the other side of programing: the part where you shore up the program to prevent it from failing in the real world. You start to build the safety net, the guard code. You’re no longer creating a program that does what you want. You’re creating additional code that prevents it from doing what you don’t want, and that’s a different process and a different mindset altogether.

What does the program do if the user accidentally types in a bogus date? What happens if it receives a malformed network packet? How does it behave if a pointer is out of bounds? These are all aspects of the “guard code” that we all have to include, even though it doesn’t add anything to the program and it doesn’t (usually) do any useful work. Oftentimes, it’s never called or executed at all. Guard code is there just to keep the program from tipping over in case something stupid happens.

Even though ladders are pretty safe, they don’t have safety features, per se. There’s no airbag at the bottom that deploys if you fall. There are (usually) no outriggers to prevent tip-overs. Ladders don’t have built-in current sensors to prevent you from using an aluminum ladder on power lines. There are no accelerometers or klaxons to alert you to unstable working angles. I once saw a ladder with a built-in bubble level to help you eyeball the slope, but that’s about it.

Programming isn’t like that. We actually spend a lot of our time adding safety features to our software. It’s like training wheels on a bicycle, except that they never come off. The guard code is always there, ready to catch that malformed packet or that bogus date, even though it may never happen.

On top of all that, we also have to guard against malicious intent, not just dumb mistakes. What if someone deliberately tries to break our program by shrewdly exploiting some weakness in the input buffer? You’ve got to guard against attacks, not just bugs. 

And we have to add security features. It’s harder than ever to make software hacker-proof, because the hackers keep getting wilier and craftier. There are accidental bugs, and then there are malicious assaults, and we generally can’t catch them both with the same kind of guard code. You have to consciously look for, and trap, both types: the known unknowns and the unknown unknowns.

So how much of today’s code falls into that “guard code” category, versus the amount that does the real work implementing the program’s putative purpose? Any guesses?

I would take a SWAG that guard code accounts for 75 percent of most modern programs. It’s got to be at least half. Looking at big chunks of open-source code, I see an awful lot of source code that’s there just to trap errors, mistakes, user flubs, and similar non-malicious bugs. It’s sometimes hard to see what a function is actually doing, buried under all that safety net.

I’ll bet that the guard code is also the source of most bugs, ironically. We put it in there to catch stupid errors, and then the bug-catcher itself malfunctions. I don’t have any objective data to back that up, but that’s been my own experience. The real core of the program works fine; it’s all that other stuff in there to prop it up that’s problematic.

When you’re working on a tall ladder, it’s good practice to have a spotter below you. Someone who will – maybe not catch you, exactly, but at least call 911 when you face-plant on the pavement. They’re your safety net.

If we could do something similar with coding, we might get much faster programs and fewer bugs, besides. Let the “real” program run on one processor (or one CPU core of a multicore processor), while the “guard code” runs alongside on a parallel processor. One does the real work; the other checks that nothing is going off the rails. One parses input while the other checks boundary conditions. One calculates results while the other checks the validity of the input parameters. If the sidekick detects an error, we abort the process or restart the function or ask for new data.

Easy to say but hard to do, of course. But imagine how efficient – and fast! – your software would be if you didn’t have to idiot-check every single parameter, input buffer, string, and checksum. Imagine programming the way it used to be, when your only concern was making the program do what you wanted, not second-guessing all the things that could go wrong. That’s what spotters are for. We need code-spotters. That, and multicore processors, can be our ladders. 

One thought on “Ignore Those Pesky Bugs”

  1. I think it boils down to the necessary checks to bring up a new system without data corruption and memory faulting, vs what’s necessary for safe operation after deployment.

    A sane programmer is still using lot’s of asserts at bring up … to catch the stupid internal problems. And maybe even running those into initial releases with sane transparent logging and recovery.

    After that, trapping unexpected switch defaults with sane recovery, and similar exits for other unexpected state values, even into production with sane transparent logging and recovery is very low overhead.

    In most cases, the anal data checking belongs where data enters the system … data import, and user interfaces.

    Plus a good program to regularly “lint/fsck” ALL your data for corruption is almost mandatory for any production sanity. Contrary to other less clueful view, data does rot, and will become corrupted by a strange mix of both hardware failures and software failures, at some point. The best self defence for this is checksum/hashing ALL critical data records/elements … once you know the data was correct when written, and the checksum/hash match on read, it’s not necessary to sanity check all the data fields again.

Leave a Reply

featured blogs
May 24, 2024
Could these creepy crawly robo-critters be the first step on a slippery road to a robot uprising coupled with an insect uprising?...
May 23, 2024
We're investing in semiconductor workforce development programs in Latin America, including government and academic partnerships to foster engineering talent.The post Building the Semiconductor Workforce in Latin America appeared first on Chip Design....

featured video

Why Wiwynn Energy-Optimized Data Center IT Solutions Use Cadence Optimality Explorer

Sponsored by Cadence Design Systems

In the AI era, as the signal-data rate increases, the signal integrity challenges in server designs also increase. Wiwynn provides hyperscale data centers with innovative cloud IT infrastructure, bringing the best total cost of ownership (TCO), energy, and energy-itemized IT solutions from the cloud to the edge.

Learn more about how Wiwynn is developing a new methodology for PCB designs with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver.

featured paper

Altera® FPGAs and SoCs with FPGA AI Suite and OpenVINO™ Toolkit Drive Embedded/Edge AI/Machine Learning Applications

Sponsored by Intel

Describes the emerging use cases of FPGA-based AI inference in edge and custom AI applications, and software and hardware solutions for edge FPGA AI.

Click here to read more

featured chalk talk

E-Mobility - Charging Stations & Wallboxes AC or DC Charging?
In this episode of Chalk Talk, Amelia Dalton and Andreas Nadler from Würth Elektronik investigate e-mobility charging stations and wallboxes. We take a closer look at the benefits, components, and functions of AC and DC wallboxes and charging stations. They also examine the role that DC link capacitors play in power conversion and how Würth Elektronik can help you create your next AC and DC wallbox or charging station design.
Jul 12, 2023
35,825 views