“There has never been an unexpectedly short debugging period in the history of computers.” – Steven Levy
Debugging your code or hardware is rarely seen as the glamorous part of the job. As engineers, we like to make stuff – to design. Any debugging activity is a tacit admission that our designs aren’t perfect; that we didn’t get it right the first time.
But debugging, and the debugging mindset, are valuable career skills – and not just in engineering. On the contrary, good debugging skills are probably most important in management. (Should you choose to go that route.)
Successful debugging requires both attention to detail and a grasp of the big picture. It’s one thing to spot a missing semicolon in your code and pat yourself on the back for catching the bug. It’s another thing entirely to intuit that malformed TCP/IP packets are causing your system to crash once every seventeen days. Both of those skills together make you a good debugger, and a good candidate for Mahogany Row.
Big companies are necessarily broken up into departments, divisions, or groups. There’s Engineering, Marketing, Finance, Sales, and so forth. That’s what we do. We’re compartmentalized on purpose.
“When we create organizations, we’re doing it to give people focus,” says Mark Okerstrom, the CEO of travel site Expedia. “We’re essentially giving them a license to be myopic. We’re saying: This is your problem. Define your mission and create your strategy and align your resources to solve that problem. And you have the divine right to ignore all of the other stuff that doesn’t align with that.”
The boundaries we draw around our workgroups are important because they define what we do and what we don’t do. They’re the out-of-bounds markers on our professional field of play. Org charts determine budget and manpower, and the responsibilities we have – and those we don’t have. They make it easy for us to say, “Not my job.”
Okerstrom knows a thing or two about big organizations and debugging the problems that go with them. In his book Upstream, author Dan Heath talks about Expedia’s customer-service nightmare. The site handled millions of online transactions, but a whopping 58% of their customers followed up their order with a phone call. To field the massive volume of calls, Expedia maintained a global network of customer-service call centers. Naturally, the company wanted to minimize the costs of those call centers, while still keeping their customers happy. They tried everything to keep the calls short and increase productivity.
That was the wrong solution entirely.
What most customers wanted was simply a copy of their itinerary. Which is strange, because Expedia was already emailing a copy to every customer. Customers already had their itinerary. Why were so many of them – 20 million customers per year – taking the time to phone up and ask for something they should already have received?
Reducing call-center costs to zero wouldn’t have fixed the problem. In fact, there’s nothing the Customer Service department could have done to stop the flood of apparently pointless calls because it wasn’t within their bailiwick to fix it. It genuinely wasn’t their problem. Nobody within that department had the visibility, or the responsibility, or even any incentive to engineer a way around it. Indeed, as Heath points out, Expedia’s Customer Service department made the problem worse because that’s what they were incentivized to do.
In any organization, what gets measured gets done. What one group wants may not be what another group wants, so we pass the buck and make things worse. A myopic view of a company problem is no different than a narrow view of debugging. If you look at only a few lines of code, or one page of a schematic, you’ll likely miss the root cause of the failure. If your only goal is to prove that the bug isn’t in your part of the design and it must be somebody else’s fault, you won’t get far in debugging. Or in your career.
Expedia worked its way out of its confusing problem by ignoring its own organizational boundaries and looking backwards in time to find the root cause, in something similar to the Five Whys technique. Why were so many millions of customers requesting information they had already received? For one, many of the automatically generated confirmation emails were landing in recipients’ spam folders, so Expedia reworded them to avoid spam filters. For another, customers were mistyping their own email address, so the messages bounced (or went to someone else). Or the email just got misplaced and the customer was calling to request a second copy.
These were all easy fixes. To reduce typos, Expedia now makes you type your email address twice (as do most websites). This seemingly trivial change was a hard sell because those working in Expedia’s sales department said they’d lose customers. Typing your email twice is a nuisance, and a small but significant portion of customers will give up. Too bad, said the CEO. A little pain at the front end will save a lot of pain downstream. In coding terms, bounds checking your inputs can save a lot of problems and redundant checking down the road.
Expedia also added a self-serve option on its website so customers can request a copy of their itinerary. That lopped off several million more pointless calls each year. From a 58% call-in rate in 2012 it’s now down to about 15%.
“When you spend years responding to problems, you can sometimes overlook the fact that you could be preventing them,” says Heath.
This all seems so obvious in hindsight, but that’s exactly the point. You almost always recognize the source of a bug after you’ve found it. Picking off the mismatched parentheses and loose wires is easy. The real skill is in identifying the bigger systemic problems that might, strictly speaking, lie outside your area of responsibility.
In medical terms, we might compare this to prevention versus treatment. Managers talk about being proactive instead of reactive. Heath calls it upstream vs. downstream thinking. I think it’s just good debugging.