feature article
Subscribe Now

Ignore Those Pesky Bugs

Software is Complicated, But How Much of it is Useful?

“We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know.” – Donald Rumsfeld

Consider the humble ladder. It’s a hardware device that’s elegant in its simplicity. Two parallel side rails, with evenly spaced rungs in between. Everything you need; nothing you don’t. Ladders can be made of wood, metal, Fiberglas, or other materials. There are long ones, short ones, portable ones, and permanent ones. Nobody really needs to be taught how to use a ladder, although there are some standard safety rules that might make your tenure at the top a bit more secure. 

Now consider your computer’s word-processing software. Or the email application on your smartphone. Not so simple, are they? In fact, there are a lot of features in most programs that seem to be superfluous. Simple, elegant, just-the-basics apps seem to be the exception. They get good reviews, but people remark on them like they’re some sort of aberration. Maybe they are. “Creeping featurism” is a term that’s almost as old as engineering itself.

But that’s not my point. We don’t need to flog the poor software galley slaves chained to their oars below decks. (That’s what managers are for.) No, the question for today is, “How much of your code does real work, as opposed to just catching errors?”

First-time programmers just starting out learn to create “hello, world!” or something similar. They get the feel for what it’s like to use a programming language to describe what they want and make it into compile-worthy code. At first, they work toward getting their early programs to obey their wishes, pure and simple. Bugs will creep in, sure, but that’s all part of the learning process.

But after the first few successful attempts, we start to learn the other side of programing: the part where you shore up the program to prevent it from failing in the real world. You start to build the safety net, the guard code. You’re no longer creating a program that does what you want. You’re creating additional code that prevents it from doing what you don’t want, and that’s a different process and a different mindset altogether.

What does the program do if the user accidentally types in a bogus date? What happens if it receives a malformed network packet? How does it behave if a pointer is out of bounds? These are all aspects of the “guard code” that we all have to include, even though it doesn’t add anything to the program and it doesn’t (usually) do any useful work. Oftentimes, it’s never called or executed at all. Guard code is there just to keep the program from tipping over in case something stupid happens.

Even though ladders are pretty safe, they don’t have safety features, per se. There’s no airbag at the bottom that deploys if you fall. There are (usually) no outriggers to prevent tip-overs. Ladders don’t have built-in current sensors to prevent you from using an aluminum ladder on power lines. There are no accelerometers or klaxons to alert you to unstable working angles. I once saw a ladder with a built-in bubble level to help you eyeball the slope, but that’s about it.

Programming isn’t like that. We actually spend a lot of our time adding safety features to our software. It’s like training wheels on a bicycle, except that they never come off. The guard code is always there, ready to catch that malformed packet or that bogus date, even though it may never happen.

On top of all that, we also have to guard against malicious intent, not just dumb mistakes. What if someone deliberately tries to break our program by shrewdly exploiting some weakness in the input buffer? You’ve got to guard against attacks, not just bugs. 

And we have to add security features. It’s harder than ever to make software hacker-proof, because the hackers keep getting wilier and craftier. There are accidental bugs, and then there are malicious assaults, and we generally can’t catch them both with the same kind of guard code. You have to consciously look for, and trap, both types: the known unknowns and the unknown unknowns.

So how much of today’s code falls into that “guard code” category, versus the amount that does the real work implementing the program’s putative purpose? Any guesses?

I would take a SWAG that guard code accounts for 75 percent of most modern programs. It’s got to be at least half. Looking at big chunks of open-source code, I see an awful lot of source code that’s there just to trap errors, mistakes, user flubs, and similar non-malicious bugs. It’s sometimes hard to see what a function is actually doing, buried under all that safety net.

I’ll bet that the guard code is also the source of most bugs, ironically. We put it in there to catch stupid errors, and then the bug-catcher itself malfunctions. I don’t have any objective data to back that up, but that’s been my own experience. The real core of the program works fine; it’s all that other stuff in there to prop it up that’s problematic.

When you’re working on a tall ladder, it’s good practice to have a spotter below you. Someone who will – maybe not catch you, exactly, but at least call 911 when you face-plant on the pavement. They’re your safety net.

If we could do something similar with coding, we might get much faster programs and fewer bugs, besides. Let the “real” program run on one processor (or one CPU core of a multicore processor), while the “guard code” runs alongside on a parallel processor. One does the real work; the other checks that nothing is going off the rails. One parses input while the other checks boundary conditions. One calculates results while the other checks the validity of the input parameters. If the sidekick detects an error, we abort the process or restart the function or ask for new data.

Easy to say but hard to do, of course. But imagine how efficient – and fast! – your software would be if you didn’t have to idiot-check every single parameter, input buffer, string, and checksum. Imagine programming the way it used to be, when your only concern was making the program do what you wanted, not second-guessing all the things that could go wrong. That’s what spotters are for. We need code-spotters. That, and multicore processors, can be our ladders. 

One thought on “Ignore Those Pesky Bugs”

  1. I think it boils down to the necessary checks to bring up a new system without data corruption and memory faulting, vs what’s necessary for safe operation after deployment.

    A sane programmer is still using lot’s of asserts at bring up … to catch the stupid internal problems. And maybe even running those into initial releases with sane transparent logging and recovery.

    After that, trapping unexpected switch defaults with sane recovery, and similar exits for other unexpected state values, even into production with sane transparent logging and recovery is very low overhead.

    In most cases, the anal data checking belongs where data enters the system … data import, and user interfaces.

    Plus a good program to regularly “lint/fsck” ALL your data for corruption is almost mandatory for any production sanity. Contrary to other less clueful view, data does rot, and will become corrupted by a strange mix of both hardware failures and software failures, at some point. The best self defence for this is checksum/hashing ALL critical data records/elements … once you know the data was correct when written, and the checksum/hash match on read, it’s not necessary to sanity check all the data fields again.

Leave a Reply

featured blogs
Jul 28, 2021
Here's a sticky problem. What if the entire Earth was instantaneously replaced with an equal volume of closely packed, but uncompressed blueberries?...
Jul 28, 2021
Hyperscale data centers are driving demand for high-bandwidth Ethernet protocols at speeds up to 800G to support HPC, AI, video streaming, and cloud computing. The post What's Driving the Demand for 200G, 400G, and 800G Ethernet? appeared first on From Silicon To Software....
Jul 28, 2021
After a long writing (and longer editing and approval seeking) process, the AIAA's CFD Vision 2030 Integration Committee has published its first update to the Vision's roadmap. This 71 page,... [[ Click on the title to access the full blog on the Cadence Community ...
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

Design Success with Foundation IP & Fusion Compiler

Sponsored by Synopsys

When is 1+1 greater than 2? When using DesignWare Foundation IP & Fusion Compiler! Join Raymond and Yung in their discussion of a customer that benefited from the combination of Fusion Compiler’s machine learning and Foundation IP cells and macros.

More information about DesignWare Foundation IP: Embedded Memories, Logic Libraries, GPIO & PVT Sensors

featured paper

Harnessing the Power of Data to Enhance Quality of Life for Seniors

Sponsored by Maxim Integrated

This customer testimonial highlights the CarePredict digital health platform. Its main device, the Tempo wearable, uses artificial intelligence to derive actionable insights to enhance care and quality of life for seniors.

Click to read more

featured chalk talk

Benefits and Applications of Immersion Cooling

Sponsored by Samtec

For truly high-performance systems, liquid immersion cooling is often the best solution. But, jumping into immersion cooling requires careful consideration of elements such as connectors. In this episode of Chalk Talk, Amelia Dalton chats with Brian Niehoff of Samtec about connector solutions for immersion-cooled applications.

Click here for more information about Samtec immersion cooling solutions