feature article
Subscribe Now

Tales from the Debugging Crypt

Share Your Best and Worst Debugging Stories

“It is easier to write an incorrect program than to understand a correct one.” – programmers’ adage

My first real job was repairing hard disk drives. I’d open up the drives, clean the platters with an alcohol pad, and align the read/write heads using a screwdriver and an oscilloscope. The guy at the bench next to mine had the same job, but he smoked all day and liked to rest his cigarettes on the edge of the drive while he worked. He was careful to blow any ashes off the platters before closing up, though. 

After toiling at the repair bench, the company might promote you to a field service job if you were deemed worthy and safe to put in front of customers. Thus did I learn to diagnose and repair computers while their angry owners looked on over my shoulder. We learned to never make the problem look too easy (“You forgot to plug it in”) or too hard (“These machines never work right”), but instead to find an appropriate balance and to radiate confidence even if we had no clue what was wrong. 

Intermittent problems are always the worst. Like Heisenberg’s uncertainty principle, customers would report a bug that disappears the moment you show up to observe it. Sometimes I’d even turn my back on the computer to fool it into thinking I was leaving. One machine failed reliably until we touched an oscilloscope probe to the faulty signal, so we simply tie-wrapped the probe in place and left it there. Problem solved!

Another machine failed sporadically about once a week, usually in the afternoon. One by one, all of us field technicians tried and failed to find the problem. Is it a buffer overflow? User error? Heat-related problem? Does the janitor accidentally unplug the machine when he’s cleaning? Are the circuit boards flexing? Is there an actual bug living inside? Did someone spill coffee on the fan? We looked for everything. 

After another frustrating day trying to get this cursed machine to fail for us, my fellow tech stepped outside and idly watched as the welding shop two doors down started up its massive arc welder – whereupon the computer promptly crashed. Yup, the massive spikes in the shared AC power lines were the culprit. The welders normally used gas torches and only fired up the big electric arc welder about once a week, usually in the afternoon. 

The takeaway from that adventure was that bugs in your machine might not originate in your machine. The problem might be environmental. The hardware and software might be perfectly solid in another environment (e.g., your development lab) but fail elsewhere for nonintuitive reasons. 

That lesson didn’t help me to track down a software bug years later. I’d hacked together a simple program to read my PC’s real-time clock and toggle an LED on the motherboard. The program was so trivially simple that I wrote it by typing in the hexadecimal opcodes. (Real programmers don’t need compilers.) Naturally, it didn’t work the first time, so I dumped out the executable file to read the disassembled code. Which I must’ve also done wrong because my simple little program was now padded out to 4KB. Must be a limitation of the operating system. No big deal. It’s not like I’m wasting a lot of memory or disk space. 

Once I’d solved my RTC bug, I spent some extra time looking at the superfluous stuff padding the end of it. I was expecting random data or uninitialized RAM but this looked like real code. Was I accidentally overwriting something or reusing RAM that already had code in it? I wonder what this extraneous code does… 

I could tell that it read from the real-time clock (just like my little hack) but it also accessed the screen buffer (which I didn’t do) and the filesystem. It did some arithmetic on the date, and it also had some hard-coded numbers that it used for comparison. It seemed to be looking for a particular date. Oddly, the code never jumped or branched outside of its little 4KB of space, which I would have expected from a random piece of another program. It appeared to be complete, not just a fragment of something else. And it seemed awkwardly written, like it was meant to be hard to follow. Almost deliberately obfuscated. Sort of like… 

A virus. My PC was harboring a virus that would lie dormant until a particular date, then write random data to my hard disk. No telling how long it had been there, but I never knew about it. It replicated by attaching itself to every program that runs, adding about 4KB of code to the end of the executable file. If my hack hadn’t been so small, I probably wouldn’t have noticed it there. 

Once again, the problem came from outside the system. 

The other lesson from this was an old one. Just because you’ve found a bug doesn’t mean you’ve found the bug. Always keep looking. Plenty of software experts have said that all programs, no matter how well constructed, have latent bugs that will never be found. That it’s genuinely impossible to write perfect code. The best you can hope for is to squash the ones that will manifest themselves in real use, and hope the remainder go undiscovered. Which raises the Zen-like question, if a bug never appears in real usage, is it still a bug? 

Hardware and software bugs are wily things, and they test our powers of observation, logic, and creativity. Let’s hear about your best (or worst) bug-hunting expeditions in the comments below. 

2 thoughts on “Tales from the Debugging Crypt”

  1. Years ago I was encountering an intermittent reset on a new circuit board that would only occur once every couple of days. Instrumenting on the board indicated a noise burst on the power line that only went off every 2 or 3 days. I set up a scope probe on the AC power (with proper safety barriers and labelling) and the noise would eventually show up after a few days. In fact, you could probe with a scope on any of the building metal in the lab and see it. It turned out a power generator in the building was malfunctioning. We needed to expedite the noise problem to investigate ways to make the design more immune to the noise, so we used an old hand-held power drill that generated plenty of electrical noise when turned on. In fact, all we had to do was plug it in and it would generate the necessary noise on the AC power line to reset the board. We got lucky in that a very simple digital filter on the reset in firmware (ignore any active-low reset pulse shorter than 4 clock periods) made the board immune to the noise, so it would keep working through the noise burst. Regards, Grady Muldrow

Leave a Reply

featured blogs
Aug 5, 2020
The Wainlux K6 is a compact, powerful, simple-to-use laser engraver. It'€™s also incredibly low-priced at around $160 for an Early Bird pledge....
Aug 5, 2020
There are some products that have become so familiar that their name has entered everyday language.  The Hoover Company became so successful in the vacuum cleaner market that its name has entered the language to describe the task they perform.  Here in the UK, many ...
Aug 5, 2020
We renamed our user conference to CadenceLIVE (from CDNLive) just in time for it not to be live and to go virtual. The first conference is CadenceLIVE Americas coming up from August 11th to 13th.... [[ Click on the title to access the full blog on the Cadence Community site....
Jul 31, 2020
[From the last episode: We looked at the notion of sparsity and how it helps with the math.] We saw before that there are three main elements in a CNN: the convolution, the pooling, and the activation . Today we focus on activation . I'€™ll start by saying that the uses of ...

Featured Video

Are You Listening?

Sponsored by Mouser Electronics

Inspiration doesn’t stick to a schedule. Luckily, creativity is a natural stimulant. Let Mouser Electronics help you on your way.

More information

Featured Paper

Improving Performance in High-Voltage Systems With Zero-Drift Hall-Effect Current Sensing

Sponsored by Texas Instruments

Learn how major industry trends are driving demands for isolated current sensing, and how new zero-drift Hall-effect current sensors can improve isolation and measurement drift while simplifying the design process.

Click here for more information

Featured Chalk Talk

Automotive MOSFET for the Transportation Market

Sponsored by Mouser Electronics and Infineon

MOSFETS are critical in automotive applications, where long-term reliability is paramount. But, do we really understand the failure rates and mechanisms in the devices we design in? In this episode of Chalk Talk, Amelia Dalton sits down with Jeff Darrow of Infineon to discuss the role of MOSFETS in transportation, solder inspection, qualification.

Click here for more information about Infineon Technologies OptiMOS™ 5 Power MOSFETs