feature article
Subscribe Now

The Debug Time Machine

Undo Software Offers Take-No-Prisoners Approach to Squashing Bugs

“Is this going to be a stand-up fight, sir, or another bug hunt?” – PFC Hudson, Aliens

If you’re debugging code, which tool would you rather have: your favorite debugger or a time machine? A good debugger is vital, sure, but a time machine… You could rewind the clock to before the point you inserted the bug and then… just… not do that.

We may be a few years (millennia?) away from working time machines, but Undo Software has something close. It’s a time machine for debugging that can rewind, fast-forward, or pause programs so that you can see exactly where things started to go south. It’s like TiVo for code.

Some of us may remember the old print advertisement with the anguished programmer yelling, “If I could find it, I could fix it!” I can’t point to any statistics, but I suspect that’s true of most bugs. We spend most of our time finding the problem; fixing it afterwards is often straightforward. Tracking down bugs is the hard part. And Undo aims to make that much more manageable.

Undo characterizes its Live Recorder debug tool as a “software flight recorder,” a black box that recreates the environment at the time of failure. And it can be an elaborate environment, too. Live Recorder can oversee massive multiprocessor server systems with hypervisors, virtualization, terabytes of RAM, shared memory, concurrent threads, and all the other trimmings. Or, it can debug your laptop PC. Either way, Live Recorder produces a data dump that allows your gdb debugger to step forwards and backwards through every single transaction, branch, memory access, and update, all down to the CPU register level. No detail is left out, even for massive multiprocessor systems.

The obvious question is… how does that all work? Surely Live Recorder isn’t really instrumenting every data bus, processor pinout, and memory array? That would produce massive avalanches of data, and where would you store it all?

No, the company is a bit more clever – actually, a lot more clever – about what and how it records. Microprocessors are deterministic, generally speaking. If the CPU executes the instruction at address 0x1000 then it will probably fetch and execute the instruction at 0x1004 right after that. Lots of things might disturb that tidy flow – an interrupt, a system fault, an external hardware event, etc. – but barring those kinds of external events, it’s pretty easy to predict what the CPU will do. That means there’s no need to record the fact that it did the normal and expected thing. Same goes for many other routine procedures. In short, Live Recorder keeps track of only the exceptional events; the expected ones are assumed.

That cuts way down on the amount of data Live Recorder monitors and records. The magic comes in recreating the entire flow of operations – backwards. Given any arbitrary point in its data dump, Live Recorder can faithfully recreate everything the processor(s) did to get there, and everything that happened afterwards. It’s a game of connect-the-dots raised by several orders of complexity.

Even with the reduced overhead, Live Recorder still takes its toll. Undo executives are shy about quantifying the performance hit, but there certainly is one. The company compares Live Recorder to a Java Virtual Machine (JVM): there’s some overhead, but the benefit (in security and reliability) is worth the cost (in performance).

There’s a sister product called Live Recorder for Production (LRP) that can be used in shipping hardware in the field. It’s a lighter, lower-overhead version of Live Recorder that you can include with your production code but leave lying dormant. Then, if (when?) a customer calls to complain that their system crashed, you can enable LRP remotely and use it to send yourself a post-mortem data dump. Instead of playing “guess my bug” with an angry customer, you can get started debugging at your own site.

Both Live Recorder and LRP can be enabled and disabled on the fly. There’s no need to monitor everything all the time. Sketchy applications can turn on Live Recorder, while fully debugged applications can leave it turned off.

Currently, Undo supports only the x86 architecture, with no urgent plans to branch out beyond that. The company’s products aren’t cheap, and their customer list reflects that. SAP, Cadence, IBM, and other suppliers of big iron and/or big software are all Undo users. One “big networking company” says Live Recorder found in 90 minutes a bug it took its own staff five years to locate.

“Fixing really tough bugs often requires luck, genius, or lack of sleep,” says Undo’s CEO Barry Morris. Live Recorder aims to undo some of that. Almost like having a time machine.

One thought on “The Debug Time Machine”

  1. I am feeling a bit of déjà vu. It is almost like I have been travelling in a time machine…
    This is great technology that addresses a real need. However, isn’t it exactly what Green Hills have been doing for many years? I feel that a wheel has been reinvented.

Leave a Reply

featured blogs
Oct 27, 2021
ASIC hardware verification is a complex process; explore key challenges and bug hunting, debug, and SoC verification solutions to satisfy sign-off requirements. The post The Quest for Bugs: The Key Challenges appeared first on From Silicon To Software....
Oct 27, 2021
Cadence was recently ranked #7 on Newsweek's Most Loved Workplaces list for 2021 and #17 on Fortune's World's Best Workplaces list. Cadence received top recognition among thousands of other companies... [[ Click on the title to access the full blog on the Cadence Community s...
Oct 20, 2021
I've seen a lot of things in my time, but I don't think I was ready to see a robot that can walk, fly, ride a skateboard, and balance on a slackline....
Oct 4, 2021
The latest version of Intel® Quartus® Prime software version 21.3 has been released. It introduces many new intuitive features and improvements that make it easier to design with Intel® FPGAs, including the new Intel® Agilex'„¢ FPGAs. These new features and improvements...

featured video

Maxim Integrated is now part of Analog Devices

Sponsored by Maxim Integrated (now part of Analog Devices)

What if the march of progress suddenly broke into a full-in sprint?

See What If: analog.com/Maxim

featured paper

How to Design with Maxim’s Latest Supervisors

Sponsored by Maxim Integrated (now part of Analog Devices)

As the technologies in MCUs, µPs, DSPs, and FPGAs move toward lower geometries and power, operational voltages become significantly low for these devices. Reducing the core voltage poses challenges in the use of high-accuracy power supply and voltage supervisors to avoid system failure. This application note discusses the critical parameters Maxim’s MAX16132–MAX16135 supervisor family and presents a reasonable approach in choosing the right reset threshold and hysteresis for voltage supervisor ICs.

Click to read more

featured chalk talk

The Gateway to Connected Intelligent Vehicles

Sponsored by Mouser Electronics and NXP Semiconductors

Connectivity is going to play a vital role in the future of connected and autonomous vehicles. One of the keys to the success of our future automotive designs will be the incorporation of service-oriented gateways. In this episode of Chalk Talk, Amelia Dalton chats with Brian Carlson from NXP about the role that service-oriented gateways will play in the future of connected and autonomous vehicles and the details of NXP’s new S32G2 vehicle network processors that are going to make all of this possible.

Click here for more information about the NXP Semiconductors S32G2 Vehicle Network Processor