Shortening the Rope

Once upon a time, a man was given a rope and was told that it would be useful for many things. That most anything could be done with that rope. And the man tried it out, found some things easy to do – tying a bow, for example – and some things hard – intricate cat’s cradle, for example. He found that he could tie large crab traps together on such a rope and run them out to sea and retrieve them later. But he also learned that having his foot in a coil as the pots were put out could be deadly. He found that heavy items could be hoisted by tying a loop at the end and running the rope over a branch, or, better yet, a pulley. He also found that putting his head through that loop was not a good idea. He decided to name this rope, and he called it “C.”

Then he received a new kind of rope — not three-dimensional, but six-dimensional. This rope could do anything the old rope could do and much more. It was more difficult to comprehend, and the implications of what could be done were not always obvious. And observed behaviors in the three or four standard dimensions might hide unexpected and unobserved behaviors in the fifth or sixth dimension. But it gave him great power to do great things, far beyond what was practical with C, even if he didn’t always know exactly what he was doing. And he named this “C++.”

Of course, C has been the default mainstream programming language for years, on desktops and in embedded. You can do anything in C, which means that one of the easiest things to do is crash the system. The much more elaborate C++ has made huge inroads in the desktop and server arenas, but less so in the embedded realm. First off, C++, when plumbing the full potential of all of the arcane features, can make you feel like you’re on a trip through the looking glass while upside down, spinning, and on acid.* From a more practical standpoint, C++ can have too large a code footprint in memory, can use too much heap memory, and some of its constructs can generate unpredictable results; formal correctness can be hard to prove.

One early effort to reign in the broad reach of C++ was done through the Embedded C++, or EC++ effort. This standard defined a necessary and sufficient subset of the full C++ language for use in embedded, with the intention that dedicated EC++ compilers could be created to generate code that would be more favorable for the embedded environment. Specifically, exceptions, namespaces, templates, multiple/virtual inheritance, runtime type identification through the typeid feature, “new style” casts, and the mutable type qualifier were eliminated. While this sounds like a potentially useful exercise, it doesn’t appear that there has been much uptake: the latest “update” on the EC++ official website was in 2002. (One item from 2002 still has “NEW” next to it – the latest meeting in Curacao. Sweet! Maybe they just decided to chuck it all and stay there.)

Meanwhile, there have been two rather more focused efforts to establish better programming practices for more specific embedded markets. The oldest of these was undertaken 20 years ago by the Motor Industry Software Reliability Association (MISRA), and is referred to as MISRA C. This established a series of rules and recommendations for the use of C in automobiles, following the general intuition that a null deference at 70 mph could hose up someone’s day. The first edition was completed in 1998 and had 93 rules and 34 recommendations. A revision in 2004 changed that to 121 rules and 21 recommendations, with some new rules being added, some existing rules being clarified, and some old rules being eliminated as unworkable.

While this effort was originally driven with cars in mind, it was eventually taken up by other industries where safety is a concern, in particular rail, medical, and nuclear. On the heels of this, C++ has been given the same treatment, with interest by an even wider set of industries. The first round of MISRA C++ has been recently announced. The frustrating thing about it is that there is effectively no free information about it whatsoever. Even the overview that motivates and summarizes MISRA C++ costs money; it’s almost like having to pay to see an advertisement for a product. And it raises the question: if this writer pays for the overview and then summarizes the information here, has some intellectual property law been violated? Most of the information available free of charge consists of the snarkfest that ensued through the valuable contribution to computer science of deciding which lists were appropriate for MISRA C++ posts.

A separate effort was spearheaded through funding by the US Department of Defense and coordinated by the Software Engineering Institute at Carnegie Mellon, to promulgate programming that is less vulnerable to meddling by those that oughtn’t be meddling. This has resulted in the CERT C and CERT C++ secure coding standards that identify ways to eliminate (or minimize) vulnerabilities in code. These are also structured with rules and recommendations. They can be readily accessed on the web, and their numbers are large; there’s a clear attempt to organize and categorize them. All have been published in a form that solicits comments from viewers; rules have been adjusted as a result of input, a process that continues. Cleanup and full standardization are expected to be complete this summer.

Neither the MISRA nor the CERT standards have specific official compliance certification. The expected approach is that commercial tools vendors will make available tools that analyze code and report back on how well or poorly a piece of code adheres to the standards. Such tools have recently been announced by LDRA for both CERT C and MISRA C++; indeed, the LDRA CERT C tool is used as a reference implementation on the CERT C website. The current version appears to fully cover 78 “standards” (using the CERT terminology) and partially cover 7 standards, with 86 standards not implemented and 30 standards unimplementable based on the way they are written. But these numbers actually include both rules and recommendations, and LDRA’s focus has been on rules only; in fact, one of the criteria separating rules from recommendations is the ability to automate checking of the rule. LDRA expects to have full coverage of all rules by late summer. A CERT C++ checker is also planned for the future.

With safe and/or secure coding practices identified, an obvious question is why everyone wouldn’t want to have their code checked out, whether or not the code is being used in a particularly vulnerable application. And the answer to that question leads us straight to the bane of all static analysis tools: false positives. The means available for enforcing these rules often entail assumptions about coding intent, and those assumptions may or may not be right for a given line of code. As a result, many of the reported “defects” (to use the industry term) may in fact not be defective. After a static analysis tool is run, the next step is triage, with the coder poring over the reported problems and deciding which of them are legitimate. Tools vendors try hard to minimize false positives – it can be such an issue that they sometimes compete on having better (that is, lower) false positive rates – but in the end, they never completely go away. As a result, use of such tools does exact a high enough cost on users that these tools aren’t generally employed outside their target application areas.

Of course there are other programming languages used more or less extensively within these application areas. Ada is famous as a language favored by defense, but the pool of engineers for it is so small that more mainstream languages are being used – presumably part of the DoD’s motivation for funding secure coding initiatives for other languages. Java is in the ascendant, but unlike C and C++, is an interpreted language, so the virtual machine can catch a lot of potentially unsafe behavior as it executes the program in real time. Additionally, Java and some of the vulnerabilities it contains are largely manifested in user interfaces, which are less prevalent (or less extensive) in embedded applications, reducing the urgency. On the other end of the maturity scale, Fortran may also be found in these applications – particularly in defense. But they tend to implement specific units of functionality that ultimately end up being wrapped in C or C++ or Ada for interfacing with the rest of the system. So if the wrapping code is secure, then the small Fortran islands will remain isolated from malicious mischief.

So that pretty much leaves C and C++ as the primary areas of focus. Which is no real surprise, really, since those languages – and C in particular – are so famous for being responsible for so many unexpected headaches.

*The author makes no representations with respect to any specific personal experiences that would validate the appropriateness of this simile.