feature article
Subscribe Now

The Old Man and the C Debugger

Fixing Bugs is Easier if You Know Where Not to Look

“Eliminate all other factors, and the one which remains must be the truth.” – Sherlock Holmes, The Sign of Four

Three engineers are sitting in a rowboat. None of them has any nautical experience or navigational instruments.

As hope for rescue fades, and with no other ships in sight, they pass the time discussing their careers (as you do). The oldest one is a veteran hardware/software developer with decades of experience. The middle one is in the prime of his career, and the youngest has just left college to start his first job.

“Back in my day, we didn’t have these fancy logic analyzers, simulators, or EDA tools,” says the grizzled veteran engineer. “We debugged computers with a voltmeter, a hair dryer, and a piece of wire. Some days we didn’t even have the wire.”

“Well, things are better now,” replies the man in the middle. “With an oscilloscope and an emulator, I can track down most any problem.”

“You guys are both crazy,” scoffs the youngster. “You can’t possibly debug a modern multiprocessor, networked, RTOS-enabled system without modern simulation tools, profilers, multichannel logic analyzers, and a team of experts. And when are we getting out of this boat, anyway?”

The oldster shakes his head sagely. “Kid, you have no idea how to diagnose and debug a system. You’re too eager to throw your shiny tools at the problem before you even know what you’re looking at.”

“The first rule of debugging,” he continues, warming to the topic, “is knowing where not to look. It’s all about saving time. There are an infinite number of things that can fail, both in hardware and in software. Your only hope of fixing a machine is to eliminate 99.99% of those alternatives, early on. Otherwise, you’re just wasting your employer’s time. Lemme give you some lessons before it’s too late for all of us.”

  • Kick the box. Seriously. If physical jostling either starts or stops the failure, you know it’s mechanically related. Look for a loose connector, a wobbly ribbon cable, or a cracked component, especially one with a heat sink glued to it.
  • Turn it upside down. Gravity typically doesn’t affect software problems, but it will highlight loose hardware, bad sockets, flexing PC boards, stray screws, or bits of solder and other conductive materials rattling around inside the enclosure.
  • Hit it with freeze spray. Semiconductors run fast or slow, depending on temperature (that’s why supercomputers and gaming rigs are liquid cooled). A well-aimed shot of freeze spray can speed up a microprocessor, DRAM, or buffer just enough to highlight a tight timing window. It’ll also weed out marginal components prone to failure.  
  • Try a heat gun. Or a hair dryer, in a pinch. You can’t target the heat as finely as the freeze spray, but it’ll still highlight temperature-related problems. Don’t forget that PC boards, connectors, and other devices expand slightly with the heat, so your problem might be a mechanical fit, not marginal timing.
  • Look for bent or loose pins. Pin-and-socket connectors are pretty good at protecting pins, but they can’t fix stupid. Sometimes a pin gets bent, and when you force the plug into the socket, the pin gets bent over perfectly flat so it’s hard to see. Look closely and mentally count each pin to convince yourself that it’s actually straight and aligned.
  • Is Pin #1 where you think it is, or are you plugging in the cable backwards? Are the pins arranged in odd/even rows, or around the circumference like an IC? Is the PCB silkscreen upside down or mirrored?
  • Just because you’ve found a bug doesn’t mean you’ve found the bug. After long hours poring over the source code, you finally have an a-ha! moment and discover the missing semicolon that’s been causing all the trouble. At last! You fix the typo, recompile, download, and pat yourself on the back for solving the problem. Except you haven’t. You found a long-buried bug, sure, but not the bug that’s causing the problem. Every program has bugs – plural – and just because you’ve found one doesn’t mean you’ve found them all, or even the right one. Keep going until you can credibly explain why this bug caused that symptom.
  • Check your watch. Does the problem crop up at the same time of day, or does it instead appear after a certain amount of running time? Test the system at different times of day to separate the two issues.
  • Think outside the box (literally). The problem might be environmental and not inside your system. Is it nearby RF radiation, bad AC power, air filtration, temperature, or contaminants? Remember, hardware bugs got their name when an insect crawled inside a relay.
  • Spy on the neighbors. A computer failed every afternoon at the customer’s site but worked fine back in the lab. The culprit was a nearby welding shop that started its high-voltage arc welder at around 2:00 PM every day, sending horrendous spikes through the building’s wiring.
  • The old off-by-one error. It’s easy to misjudge the size of a buffer, counter, or iterator by a small amount. Add or subtract 1 from pointers, parameters, and passed values and see what happens.
  • What else is running? A lot of systems are connected in some way – Ethernet, USB, Wi-Fi, shared memory, semaphores, whatever – with some other process or system with its own autonomy. The root of your problem might be in another system.
  • Caches are evil. On-chip caches speed up our processors, but they also obfuscate the activity inside. They behave entirely nonintuitively and are difficult or impossible to probe from the outside. Cached code and/or data can drastically affect performance – that’s kinda the point – but in unpredictable and nondeterministic ways. Try preloading the cache, freezing its contents, or disabling it during debug.
  • Use your MMU. High-end processors have hardware MMUs that can prevent programs or processes from stepping on each other. Configuring the MMU can be a chore in its own right, but, once it’s done, it’s a low-impact way to be sure your code isn’t straying where it shouldn’t.  
  • Schrödinger’s Bug. Does touching an oscilloscope probe or logic analyzer clip to your hardware make the problem disappear? All probes have measurable capacitance and inductance. If touching a probe to a pin cures a problem, you’ve got marginal signal integrity.
  • Explain it to a six-year-old. Talking through a problem, out loud and in simple terms, can trigger interesting insights as your brain struggles to find new ways to paraphrase the problem. Give it a try.
  • Don’t assume. The hardest trick of all is to discard assumptions, hearsay, and hunches. People are unreliable witnesses. When someone says, “It fails every time I press this button,” don’t assume it’s button-related. Or that it happens “every time.” Or even that it’s a problem. Start over with your own testing.


After hearing all of this, the youngest engineer decided to step out of the rowboat and back onto the floor of the sporting goods store. It was getting late and the employees were eager for our three engineers to finish “testing” their new boat and go home. What? You thought the boat was adrift on the water? Why would you assume that?

One thought on “The Old Man and the C Debugger”

Leave a Reply

featured blogs
Oct 19, 2020
We'€™re proud to see that many expert verification teams exploit the powers of UVM vr_ad, in implementing intricate verification environments in e . The vr_ad is an open source package, part of UVM- e... [[ Click on the title to access the full blog on the Cadence Communit...
Oct 16, 2020
Another event popular in the tech event circuit is PCI-SIG® DevCon. While DevCon events are usually in-person around the globe, this year, like so many others events, PCI-SIG DevCon is going virtual. PCI-SIG DevCons are members-driven events that provide an opportunity to le...
Oct 16, 2020
If you said '€œYes'€ to two of the items in the title of this blog -- specifically the last two -- then read on......
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

Fundamentals of Precision ADC Noise Analysis

Sponsored by Texas Instruments

Build your knowledge of noise performance with high-resolution delta-sigma ADCs. This e-book covers types of ADC noise, how other components contribute noise to the system, and how these noise sources interact with each other.

Click here to download the whitepaper

Featured Chalk Talk

Microchip PIC-IoT WG Development Board

Sponsored by Mouser Electronics and Microchip

In getting your IoT design to market, you need to consider scalability into manufacturing, ease of use, cloud connectivity, security, and a host of other critical issues. In this episode of Chalk Talk, Amelia Dalton sits down with Jule Ann Baker of Microchip to chat about these issues, and how the Microchip PIC-IoT WG development board can help you overcome them.

Click here for more information about Microchip Technology PIC-IoT WG Development Board (AC164164)