Using Generative AI for Refactoring and Debugging Code Cuts Debugging Time in Half!

I am currently wearing my patent-pending puzzled and perplexed expression. This manly mien was handed down to me by my father when he determined he had no further use for it. It was the aspect he assumed when my mother requested him to undertake some chore around the house. Oftentimes, he managed to look so befuddled and bewildered that she ended up performing the task herself. I’ve not yet achieved this pinnacle of prowess and proficiency, but I’m assiduously working on honing my skills in this department.

The reason for my current condition of confusion is that, just as my fingers touched down on my keyboard to compose this column (both I and my dexterous digits were looking forward to discovering what would come next), an extremely loud “crack” occurred somewhere to my left.

“What did this sound like?” you ask. “What does cheese taste like?” I respond (which reminds me of the English writer and philosopher G. K. Chesterton, who famously noted that “Poets have been mysteriously silent on the subject of cheese,” and you simply cannot argue with logic like that). Oh well, if you insist, it was similar to the sound you might get if you dipped an object formed from heated glass into a container of cold water. That’s the sort of “crack” we are talking about.

The funny thing is that the same sound struck about a week ago. However, no matter how hard I’ve looked, I’ve failed to determine the cause. All of my flashy (as in flashing light-emitting diode-based) objets d’art continue to flash furiously, my Nixie tube creations continue to nix, and my vacuum tube artifacts do what they do best, which is to look spectacular. This is a poser and no mistake. It’s like having the sword of Damocles hanging over my head while waiting for the other shoe to drop (I never metaphor I didn’t like).

But we digress… I recently heard from my chums at KiteRocket.com who said they wanted to introduce me to a cool company with a cunning technology. You can only imagine my surprise to discover the name of this entity, causing me to exclaim, “My friend in the next office is called Bob. One of the guys with whom I gather weekly to watch Dr Who is called Bob. My carpenter chum is called Bob. Two of my customers are called Bob. And now you want to introduce me to a company called Metabob that has a tool called Metabob?” (And don’t start me talking about the 1991 black comedy movie What about Bob?)

The upshot of all this is that I had an awesome chat with Massimiliano Genta (he also goes by “Max”) and Avinash Gopal, who are the CEO and CTO, respectively, at Metabob.

I’ve said it before and I’ll say it again, I’m a simple man (this is the point where many people, including my wife, usually chime in to agree with me, but I’ve not finished yet) who likes a simple story. I specifically like things I can wrap my brain around, as opposed to convoluted presentations about tortuous technologies that leave me gasping, “Say, what?”

In the case of Metabob, the story is as simple as it gets, which in no way detracts from the fact that the underlying technology is mind-bogglingly complex and sophisticated. In a crunchy nutshell: (a) a lot of software developers are now using generative artificial intelligence (AI) tools to generate code, (b) a lot of this code has bugs, and (c) Metabob uses a combination of graph-attention neural networks and generative AI that can detect bugs and help you fix them.

Now I come to think about it, I could have boiled this entire column down to the final sentence of the preceding paragraph, but where would be the fun in that (not least that you wouldn’t be presently pondering the topic of G. K. Chesterton in the context of cheese)?

Let’s flesh things out just a little, as follows. On average, software developers spend about 20% of their time thinking about the code they are going to write, 30% of their time actually writing the code they’ve just been thinking about, and the remaining 50% of their time debugging the code they’ve just written. It’s safe to say that this debugging is tedious, time-consuming, and costly, especially when you realize that the cost of the average developer to companies in the USA is about $160/hour. Furthermore, bugs that slip through the cracks and end up in production code cost an average of $10,000 to fix (eeek!).

The use of AI for code generation is exploding, as are the bugs that come with it. I really hadn’t thought about this before. If you had asked me, I would have assumed that AI-generated code was “correct by construction,” but I have no idea why I would have made such a foolish assumption. The problem is that these AI models are largely trained on open-source code that contains a lot of bugs, so—not-surprisingly—these bugs find their way into the code generated by the AI. This is all so obvious when you think about it.

Microsoft’s Visual Studio Code (a.k.a. VS Code) is a source code editor that also acts as an IDE. In a 2022 Developer Survey by Stack Overflow, approximately 75% of the 71,010 respondents said they use VS Code, which ranked as being the most popular developer environment tool. A key aspect to VS Code’s success is its extensible nature, which allows it to support a wealth of plug-in extensions that provide access to tools from third-party vendors.

Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI (the creators of ChatGPT) to assist users of VS Code. Copilot is currently by far the most used code generation tool. Its proponents like to waffle on about how Copilot is behind around 60% of newly developed code and how developers using Copilot are spending 55% less time writing new code. What they fail to mention is that 40% of the code generated by Copilot contains bugs and security vulnerabilities, which means the same developers are going to spend more time debugging the code they took less time to capture (this is similar in concept to the folks who say, “Whatever doesn’t kill you makes you stronger,” while neglecting to note that it almost killed you).

AI code generation is exploding… as are the number of bugs that come with it (Source: Metabob)

In the context of computer programming, a “smell” is any characteristic in the source code of a program that indicates the possibility of a deeper problem. Determining what is and is not a code smell is subjective and varies by language, developer, and development methodology. Metabob works similarly to traditional static code analysis tools, utilizing existing rule sets to detect known and labeled problems and code smells.

On top of these rule sets, Metabob also utilizes a unique machine learning (ML) model to detect complex problems that remain undetected when using predetermined rules. This model has been trained on millions of bug fixes performed by veteran developers, allowing it to learn to recognize the root causes of many logical and context-based issues. What we are talking about here is more than 150 categories, each containing hundreds of individual types of bugs.

More bugs than you can swing a stick at (Source: Metabob)

Like Copilot, Metabob is available as a VS Code extension. As illustrated below, Metabob’s user interface is similar in concept to that of Copilot in that it detects things, makes suggestions, and allows you to apply or deny its recommendations. Unfortunately, I now (a tad unfairly) view Copilot as the tool that injects bugs into your code while Metabob is the tool that finds CoPilot’s bugs (and human-generated bugs too, of course).

Detecting and fixing bugs with Metabob (Source: Metabob)

I don’t know about you, but I’m in the mood for a short video showing Metabob in action. How about we all watch this, and then we can regroup here in a moment?

As always, of course, “The proof of the pudding is in the eating.” According to the guys and gals at Metabob (the company), Metabob (the tool) dramatically outperforms traditional static code analysis tools like Sonarqube and regular linters. Feast your orbs on the numbers below.

Feast your orbs on these numbers (Source: Metabob)

I have to say that these are jolly impressive, as are the accompanying statistics, such as 34% increase in developer productivity and 53% higher detection rate of critical errors. Of course, as that philosopher of our times, Homer Simpson, famously noted, “People can come up with statistics to prove anything.”

I have to say that being introduced to the chaps and chapesses at Metabob has opened my eyes. It seems that every day these days I’m exposed to some new application for AI that had never even struck me. As just one example, I only recently heard about a company that was combining AI and brainwave monitoring in earbuds to… but I’m afraid we will have to leave that to my next column. In the meantime, do you have any thoughts you’d care to share with respect to anything you’ve read in this column (including G.K. Chesterton’s strange obsession with cheese)?