feature article
Subscribe Now

Bug-B-Gone!

ARM Programmers Get a New Extermination Tool

What do Netflix movies, embedded programmers, and Raymond Chandler mysteries all have in common?

More on that later. For now, let’s look at a new debugging tool available to ARM users (and they are legion) that helps to find and swat bugs. Just this week, ARM announced that its premium Designer Studio v5 (DS-5) software suite will include a new feature designed by Undo Software. It’s a pretty sweet tool, and one that seems to have a good track record of uncovering hard-to-find bugs and saving programmers time.

Called “application rewind,” it’s a kind of time machine for debugging. We all know about traditional debuggers: you set breakpoints, you single-step, you monitor variables, you follow trace paths. That’s all well and good, but debuggers by their nature capture problems after they’ve happened. You’re then left to do a post-mortem analysis of why and where the problem started. Much of the skill of debugging lies in correctly guessing where the problem might lurk and instrumenting the code accordingly. Guess right, and you locate the source of the bug early on. Guess wrong, and you waste days (or weeks, etc.) exploring dead ends.

What all of these methods have in common is that, in the words of Brian W. Kernighan, you have to “…reason back from the state of the crashed program to determine what could have caused this. Debugging involves backwards reasoning, like solving murder mysteries. Something impossible occurred, and the only solid information is that it really did occur. So we must think backwards from the result to discover the reasons.” [The Practice of Programming, Addison-Wesley, 1999, ISBN 978-0201615869.]

Sound about right? Well, Undo Software thinks it’s tilted the odds in programmers’ favor just a bit. Rather than rely on the old gumshoe method of solving software mysteries, it’s recommending more of a surveillance state; a kind of CCTV for suspect programs. Its debugger allows you to “rewind” code and run it backwards, following it as far back in time as you like, until you discover the root cause of the bug.

It’s a concept that’s been tried before (sometimes successfully), but it’s never before been officially endorsed by ARM. Undo has licensed the debugger, called UndoDB, to ARM for inclusion in the latter’s official DS-5 development suite. Thus, UndoDB will become the de facto debug technology for ARM users who shell out the money for DS-5. For programming teams that don’t use DS-5, Undo is happy to sell them a standalone, retail version of the software.

So how exactly do you run software backwards? By running it forwards, obviously. The details are complicated and understandably secret, but the basic premise is that UndoDB occasionally saves the state of the machine at strategic points in the program flow. When you request a “rewind,” UndoDB reloads the snapshot nearest to the point of interest and then reruns the code from that point, but backwards. It’s essentially like an emulator or instruction-set simulator, but one that “undoes” the results of each instruction instead of emulating their normal behavior.

The magic lies in knowing which CPU operations are deterministic and which aren’t. Adding two numbers together is deterministic, so it’s easy to emulate an ADD instruction running either forwards or backwards while accurately reproducing all of its side effects (such as updating status flags). Reading from shared memory, on the other hand, is non-deterministic: the results depend entirely on what was in memory at the time, so UndoDB takes a virtual snapshot of the system state at that moment. Volatile I/O devices, network packets, and timers are other examples of unpredictable events that require a snapshot.

By cleverly taking snapshots only when they’re absolutely necessary and emulating the rest, UndoDB is able to minimize its impact on system performance while still maintaining 100% fidelity. Earlier attempts at “time machine” debuggers have used the brute-force approach of storing nearly everything, all the time. While that’s reliable, it’s also painfully slow. Undo Software uses the example of the gnu debugger’s “process record” feature, which allows backward tracing but can slow execution by more than 200,000 times. A compression algorithm that takes 10 seconds to run in real time takes 22 days with process recording turned on. Not an ideal debugging tool.

UndoDB is not without its side effects. It also slows execution, but not nearly as much as gdb. In the same comparison, the algorithm took about 38 seconds to run, a roughly 4x slowdown. Clearly, UndoDB incurs some overhead, but it saves three weeks over gdb.

The concept behind UndoDB is a lot like that for MPEG video compression. With MPEG, certain “key frames” of the video stream are compressed, while others are interpolated. That saves a lot of space versus just compressing each frame of video, but it requires a lot of processing power to reconstitute the missing frames in real time. Similarly, UndoDB stores the system state only at certain “key events” in the program flow when it can’t accurately predict what will happen (e.g., a read from an Ethernet buffer). All other program actions are simply reconstructed using known, deterministic information. A beneficial side effect is that the recreated code can be run backwards just as easily as forwards, yielding a useful debug tool.

The tool requires only two things to run: an ARM processor and Linux or a Linux-like operating system. (Actually, that first part isn’t true; UndoDB is available for x86 processors, too.) It’s been tested on Android, Red Hat, and SUSE; other embedded Linux spinoffs like LynxOS may work as well, but they haven’t been tested yet. Specifically, no special compilers, plug-ins, or libraries are needed. UndoDB can work on already-compiled binary code.

UndoDB is no panacea. Anything that slows execution by 4x is going to cause problems when debugging real-time code. And it’s invasive; it requires memory, time, and system resources to run. It’s not going to be the ideal tool for everybody. And it works on Linux systems only (which pretty much excludes real-time projects anyway). But if you’re already a paid-up DS-5 user, it’s a free upgrade.

And how often do you get free time? 

One thought on “Bug-B-Gone!”

  1. I duuno. For me this is a bit like auto-correct in Word. In the end, you don’t know how to spell because Word is doing everything for you.
    This “undo” thing lets you write code that is effectively above your intellectual skills. I don’t think that’t a good idea in the long run.

    In my days all we had was printf and we liked it! Now get off my lawn…

Leave a Reply

featured blogs
Apr 18, 2021
https://youtu.be/afv9_fRCrq8 Made at Target Oakridge (camera Ziyue Zhang) Monday: "Targeting" the Open Compute Project Tuesday: NUMECA, Computational Fluid Dynamics...and the America's... [[ Click on the title to access the full blog on the Cadence Community s...
Apr 16, 2021
Spring is in the air and summer is just around the corner. It is time to get out the Old Farmers Almanac and check on the planting schedule as you plan out your garden.  If you are unfamiliar with a Farmers Almanac, it is a publication containing weather forecasts, plantin...
Apr 15, 2021
Explore the history of FPGA prototyping in the SoC design/verification process and learn about HAPS-100, a new prototyping system for complex AI & HPC SoCs. The post Scaling FPGA-Based Prototyping to Meet Verification Demands of Complex SoCs appeared first on From Silic...
Apr 14, 2021
By Simon Favre If you're not using critical area analysis and design for manufacturing to… The post DFM: Still a really good thing to do! appeared first on Design with Calibre....

featured video

The Verification World We Know is About to be Revolutionized

Sponsored by Cadence Design Systems

Designs and software are growing in complexity. With verification, you need the right tool at the right time. Cadence® Palladium® Z2 emulation and Protium™ X2 prototyping dynamic duo address challenges of advanced applications from mobile to consumer and hyperscale computing. With a seamlessly integrated flow, unified debug, common interfaces, and testbench content across the systems, the dynamic duo offers rapid design migration and testing from emulation to prototyping. See them in action.

Click here for more information

featured paper

Understanding Functional Safety FIT Base Failure Rate Estimates per IEC 62380 and SN 29500

Sponsored by Texas Instruments

Functional safety standards such as IEC 61508 and ISO 26262 require semiconductor device manufacturers to address both systematic and random hardware failures. Base failure rates (BFR) quantify the intrinsic reliability of the semiconductor component while operating under normal environmental conditions. Download our white paper which focuses on two widely accepted techniques to estimate the BFR for semiconductor components; estimates per IEC Technical Report 62380 and SN 29500 respectively.

Click here to download the whitepaper

Featured Chalk Talk

Bluetooth Overview

Sponsored by Mouser Electronics and Silicon Labs

Bluetooth has come a long way in recent years, and adding the latest Bluetooth features to your next design is easier than ever. It’s time to ditch the cables and go wireless. In this episode of Chalk Talk, Amelia Dalton chats with Mark Beecham of Silicon labs about the latest Bluetooth capabilities including lower power, higher bandwidth, mesh, and more, as well as solutions that will make adding Bluetooth to your next design a snap.

Click here for more information about Silicon Labs EFR32BG Blue Gecko Wireless SoCs