Undo Your Mistakes at the Customer’s Site

“Programming will be the last job on the planet.” – Jaan Tallinn, co-creator of Skype and Kazaa

Even hardware is software these days. With a hardware-development language (HDL), you can “compile” gates, boards, and even entire systems. Not everybody does… but you could. Software determines product functionality at least as much as the hardware does, and programming is a much easier skill to learn (and to teach) than hardware engineering. We’re becoming an industry of coders.

Yet one of the big downsides to programming is that you generally spend more time debugging your code than you do creating it. That’s the consistent conclusion of repeated surveys of professional programmers from around the world: they spend more time debugging than on any other part of the job.

Or maybe that’s actually a good thing in disguise. Hardware engineers generally spend less time (as a percentage) on debugging than their coding colleagues do, but that’s likely because hardware bugs are so expensive and time-consuming to fix. You can’t just iteratively debug a new 10-million-gate SoC by fixing little things here and there every half hour. It’s got to be pretty much perfect the first time. Consequently, hardware folks spend a lot of time up front simulating and verifying their design before they push the big red button. Programmers? Not so much. We’ll fix it as we go along. Or maybe sometime after that. We’ll see.

The result is a lot of iffy code that gets shipped out to the customer, with a vague promise of a fix, an update, or an upgrade somewhere down the road. Why hold up revenue when we can patch bugs in the field? It’s the end of the quarter; let’s make our number and worry about the update later.

This all assumes that you already know where the bugs are, however. It’s all very nice to push planned updates to your customers. But what about real bugs – the unexpected kind? What happens when tech support starts getting calls and emails about some failure you’ve never heard of and never anticipated? Now you’ve got a real problem. You’ve got systems installed in the field that are failing in ways you can’t duplicate in the lab. It’s too late to call the faulty systems back. And all the units in-house seem to work just fine. What’s going on at the customers’ site that’s making these things fail?

Who you gonna call?

Undo Software, that’s who. Or at least, they’d like you to give them a chance. Undo is a very small (12-person) company in Cambridge, UK that has been peddling a debug tool for a few years. It’s not a standalone debugger itself, but an enhancement for several popular debuggers that allows you to “rewind” program execution, one instruction at a time. It’s essentially a time machine for code. You can stop execution, back it up, roll it forward again, and see every pointer, register, and machine instruction along the way. It’s an obsessive amount of detail, but sometimes that’s exactly what you want.

The tool works by logging every processor activity and memory reference and squirreling it away in a circular buffer that you set aside in your system memory. There is some processor overhead: “You can’t record everything a processor does, instruction by instruction, without incurring some overhead. But we’re clever about minimizing it,” says company cofounder Greg Law. The debug code itself occupies “a few megabytes” of code space. The size of the buffer is up to you, but a few more megabytes is a good starting point. The tool runs only on Linux or Android systems, either ARM- or x86-based.

That’s the old product. The new product, just announced this month, allows you to collapse both time and space. In other words, it lets you rewind code on a remote system – say at your customer’s site. Up until now, you could use Undo’s magic time machine only on a local system sitting in front of you. Now you can operate it remotely, too.

The remote capability gives developers a handy new tool for cracking open buggy systems that are out of reach. Simply include Undo’s special “Live Recorder” library in your shipping code, and leave it dormant until the bug day comes. Then, activate it remotely and voila! Spooky action at a distance.

The remote version logs all the same data, and gives you all the same detailed visibility, as the local version. It’s essentially a headless version of the earlier product, with the addition of up/downloading capability and some intrinsic privacy. Why privacy features on a debugger? Two reasons. One, you don’t want just any old programmer examining your detailed trace logs. Given the level of detail it records, UndoDB would be an ideal tool for reverse-engineering code. It would also be a handy cracking tool, were you so inclined.

The second reason has more to do with commerce. Undo licenses its software on an annual basis: buy one license and deploy as many copies as you like within a single project/product. No royalties; no need to count systems. That’s very convenient, but it also makes it hard for Undo to get paid. To thwart unscrupulous reuse of its code, Undo serializes each copy of Live Recorder, and the linked library will communicate only with its “twin” back at your desk. That allows you to talk to your remote system, but it keeps other people (and even other Undo Live Recorder users) out.

So now you can feel slightly less guilty about debugging code after it’s shipped to the customer.