Nailing Jell-O to a Wall

Oddly, the engineering director didn’t seem as impressed as he should have been. Perhaps he couldn’t see how well the software project was going already? Only days into the project and it looked like it was already 80% complete or so. The next six months would be a cake-walk. The team obviously would finish all the required functionality as well as Marketing’s “nice to have” list. The team leader started wondering what extra goodies the engineers would be able to slip in during their spare time.

A month later, things were still looking great. Not quite as great as before, of course. The original UI had to be scrapped because it wasn’t implemented on the right framework. Also, most of the feature implementations had been just stubs initially, and now coding full and robust implementations of them was taking considerably more time. That was all expected, though. With a month to go before functionality freeze and alpha start, the project was well on track. There was really nothing new visible in a demo, so the engineering director was left out of the loop at this juncture. The manager decided to wait until next month to bring him in again – at the alpha milestone.

With two days to go before alpha, the manager was feeling the frenzied pace. Marketing had come in with a couple more feature requests, and some of the less important features had been tabled while the important bugs were fixed. The team had been bringing a lot of new capabilities online, and a number of serious bugs had crept in that had to be fixed before more features could be added. The manager rationalized that it was better to get the system more stable before alpha and postpone the final few features until after functionality freeze. He reasoned that the alpha customers would need to be able to do some kind of useful work with the software. It wouldn’t help much to have each and every feature coded if most of the important ones just crashed.

Evidently, alpha test had sneaked up on marketing, so they didn’t have any actual customers lined up for the event. Instead, they compromised by bringing in a few applications engineers for a “preview and bash” session. The manager had gone into the lab the day before and installed the alpha version of the software on all the machines. It was a nightmare, because the real installation packages weren’t written yet. Directories had to be manually created and libraries hand-loaded into place. The install was done at midnight and the software was still not working perfectly on all the machines.

After the bloodbath of the alpha bash, the development team rolled up their sleeves and got down to business. The applications engineers hated the organization of the user interface. Most of them hit major crashes and bugs, and none had successfully completed the exercise they brought with them. They each had envisioned different paths through the system that didn’t match what the development team had implemented. Over 100 bugs and feature requests had been filed, and major re-work was in order. In other words, it was a pretty typical alpha test.

The next two months were made up of long hours, time away from family, pizzas slipped under the office door, and considerable stress among the developers. QA had kicked into high gear, and the number of open bug reports had skyrocketed. The AE manager had complained to the engineering director that the alpha bash was a waste of his people’s time and that the project was seriously off track. Vacations were being cancelled, features were being scrapped, and the team was instructed to spend the time just “stabilizing the current system.”

Finally, and somewhat ironically, almost four months after the first “demo,” the system looked almost identical to what the team had shown the director four months earlier. Of course, under the hood, there was a wealth of difference. Most of the features were now fully implemented and working, and the UI was much more intuitive. The database was now live and the error handlers were mapped to actual messages. The persistent file storage was now working so user data could be saved and restored successfully – most of the time, anyway. On the calendar, the project was supposed to be at code freeze and beta test start. Marketing, however, wanted a four week delay. They didn’t want to expose any customers to the state of the system today. A “guru” technical marketing engineer was even appointed to provide oversight to the development team’s efforts. Marketing no longer trusted engineering to develop autonomously.

The two months that had been planned for beta were quickly eaten up by a seemingly endless cycle of testing and debugging. Instead of getting shorter, the bug list was growing, and the engineering team felt that many of the so-called bugs were “enhancement requests” disguised as bug reports. Tensions between marketing and engineering grew. The sales channel was angry because the product was now delayed. Company executives were considering replacing the manager, but they didn’t want to further de-stabilize the project.

Twelve months after the beginning of the “six month” development project, the first production version of the software was shipped. It had only a meager subset of the originally intended functionality and bore little resemblance to the fantasy prototype that had been demonstrated almost an entire year earlier. The development team was demoralized and decimated – losing several key members during the process.

If this sounds like a typical software development project in your company, you are not alone. Countless talented and well-intentioned software development teams repeatedly fall victim to the subtle but serious software engineering traps that snagged our heroes. Fortunately, if you watch for the signs, you can steer clear of these colossal sneaker waves and keep your team safe on the beach instead of watching helplessly as they wash into the sea of endless debug cycles.

First, if you were to graph it, software follows one of the world’s strangest curves of “apparent completion” versus “actual completion.” What does this mean? Often, when software appears to be 90% complete, it is only 10% complete in reality. For applications with a graphical user interface (GUI), it takes almost no time to prototype the GUI and create something that looks dangerously similar to the final product. This demo typically sets unreasonable expectations in management and even sometimes lulls the development team themselves into a false sense of security.

Second, despite the best intentions in the world, software specifications almost never define a usage model that works well. The only reliable way to capture and refine the user experience with software is to iteratively expose realistic users to the system and make refinements in the process based on their feedback. Of course, those users have to understand that they are acting as part of the development team and that they are refining software that is far from production ready. Setting proper expectations among alpha testers is key.

Third, most teams are woefully naïve in the classification of their bug reports. Typically, a single rating scale is used that goes from something like “Critical” (this bug must be fixed right now) to “Low” (this bug will never be fixed). There is a tendency in such a system for severe inflation of the classifications as stress mounts on the team. In some cases, teams even define new, higher levels of criticality on the fly – “Super-Critical” and “Mega-Super-Critical” come to mind.

Of course, the goal of a bug tracking and classification system is to keep the development team working on the most important issues first, and to track and document the known issues needing to be addressed. The problem is – with a new and complex piece of software, the list quickly outgrows the complexity-handling capability of the rating and tracking system.

Realistically, there are at several distinct axes on which problems should be rated. The first is the likelihood of a customer encountering the bug. Some problems may occur every time a user fires up the software. Others may happen only in extremely rare corner cases or in theoretical cases that may never occur at all in practice. A second is the impact of that problem on a user of the system. Some problems may be only cosmetic – a few pixels that aren’t aesthetically pleasing on the screen. Others may be catastrophic – errors that could cause (in the case of safety-critical systems) loss of life or property damage, or, in more conventional applications, loss of critical data. A third axis is the amount of engineering effort required to fix the bug. There is generally no correlation between this assessment and the other two. However, the effort required is certainly a big factor in scheduling the work in engineering. A fourth factor to consider in assessing bug reports is the likelihood that the fix will destabilize some other aspect of the system.

The actual “importance” of addressing a particular bug may be something like the product of these four ratings – Obviously, the most important bugs to fix are those that happen frequently and carry severe consequences. The decision to go ahead with a fix, however, also must consider the resources and risk involved in implementing it. Educating an entire product team (even those marketing folks) on these factors can be extremely valuable in keeping a project on track, for maintaining reasonable expectations across marketing, engineering, and management, and for avoiding irrational abuse of a single-score bug tracking system.

All of these techniques really are aimed at maintaining a realistic and shared understanding among everyone associated with the project on how much work has been done, how much remains, and what the final product will do when completed. The difficulty in accurately estimating software product development effort and realistically tracking progress has caused the downfall of many intelligent, capable, and talented software teams. Keeping expectations about those issues in line can often be more important for the careers of those involved than actual productivity.

Nailing Jell-O to a Wall

Related

Leave a Reply Cancel reply

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

featured chalk talk