Secure Software

We’ve invested a lot of ink in these pages on security. But security can mean a lot of different things. Usually it refers to the protection of networks and data, but it can also deal with the protection of intellectual property, whether simply to hide family design jewels or as yet another aspect of protecting networks and data. However, in the latter case, most of our focus has been on making hardware harder to hack and reverse engineer. What about software?

Probably the biggest “protection” of software is compilation: turning it into object code. Object code is a pain to read for a human. Unfortunately, it’s not hard to decompile for a computer. Yeah, the resulting code won’t be as easy to read as the original source, but it’s way better than trying to hack object code. (Of course, anything above the level of object – byte code or just-in-time-compiled code – will be that much easier to digest for someone who shouldn’t be digesting it.)

So, really, code of any sort is, for practical purposes, in the clear, with minor annoyances for anyone with a serious interest in figuring out what’s going on. And why might they have that serious interest? Well, industrial espionage is one reason. But another motivation would be to study the code for weaknesses so that they can figure out how to bypass protections or even modify the code for less-than-benevolent purposes.

So hiding code – or, at least, hiding critical bits of code – is a natural complement to securing hardware. But, short of encrypting object code and equipping every computing platform everywhere with a decryption option for code that’s going to be run (and ensuring that it’s impossible to access the code after decryption), how is one to do this?

This is something Irdeto has been working on for nigh onto 20 years now, they say. Like most security solutions, there isn’t just one thing that you do and – voilà – you’re secure. There are many tools available, and what you do and how much of it you do depends on what you’re protecting and how much development and execution time you have available to spend on it. And you can apply techniques to protect data as well as code.

Unreadable Code that Won’t Get You Fired

Irdeto’s approach focuses on the source code, not the object code. Intent is available (or at least evident) at this level, giving them more that they can do. The obvious way to make source harder to reverse engineer is what’s referred to as obfuscation. This involves replacing obvious names by inscrutable names in order to obscure semantic intent, along with perhaps some other minor steps that make the code harder to follow.

But, while this can be one of the tools available, really, it’s not such a big deal – especially if your potential attacker isn’t going to have access to the original source code. Such obfuscation more or less disappears in true object code (while it might provide some assistance with intermediate-level code).

Another technique that does affect object code is what Irdeto calls spaghettification. No, this isn’t done by passing the code through a black hole. It’s done by committing all of the crimes that would get you a failing grade in coding class or get you fired as a coder: taking perfectly readable code and mangling it so that it becomes incredibly difficult to follow. Why do directly that which needs to be done, rather than taking a leisurely meander through various meaningless branches and gotos that will obscure and misdirect a reader?

Of particular interest here is the dispatch table. All those functions that you lovingly named so that you know exactly what they do when they’re called? Bah! That’s for newbies! Pack your code with testosterone by replacing those wimpy calls with jumps into a table that sends the code off to that very same function. Except that, this time, it’s way harder to trace and figure out why the jump is being made at all. You could even introduce dummy functions that do nothing but waste an attacker’s time tracing down useless routes. (Irdeto didn’t specifically say that they do this, but, if they don’t, they could…)

These particular techniques are handled by Irdeto’s transcoder, which is a source-to-source tool. The idea is that you work on a readable version of the source code and then, at build time, transform it for the poor schlub that has to reverse engineer the resulting object code.

They claim that these techniques typically have performance impact in the range of 8-10%. That’s a tunable number: you can dial up the level of modification (including no modification at all) for various parts of the code, so it’s not necessarily all the same. But you want to do that judiciously so that it’s not obvious which parts of the code are protected and which aren’t. This tuning is done by pragmas in the code, reference files, and other means. (Needless to say, not particularly transparent about how the details…)

Diversity: It’s a Thing

Those transformations (the ones we mentioned and the ones we don’t know about) are fine for a piece of code. But why not mix things up with some variety? Various techniques can be applied to code and to data using what Irdeto calls diversity.

For example, when you do your next revision of the software, make unchanged parts of the code look very different from the prior version. That way no one can take the old and new versions and diff them to see what changed.
You could transform the code of a given version differently for different geographic areas. Even though the code ultimately works the same in each version, the object code will look different.
You could create “micro-chunks” of code that update constantly so that they’re always being replaced by something that works identically but looks completely different. This appears to be a work in progress.
You could protect individual modules, with each having several versions. Then, at build time, you could make a bunch of different build versions by mixing and matching the different module versions. And you thought combinatorics was there in math class just to annoy you…
It’s even possible to create a separate version of code for each individual chip that will run it. This takes more work, since it has to be done at manufacturing time. First, each chip has to be identifiable, and, second, you have to maintain a database so that you know what went where.

Note that iOS and Android don’t allow diversity. Irdeto says that that could change eventually, but I don’t know if this is just a hope or if they’re aware of something specific that they can’t tell me.

Your Own Personal Spaghetti

On that last bullet above, the chip identifiability part of it is being handled by working with Mapper, a company that does e-beam lithography. The idea is that you have some array of lines and then cut out most of them using e-beam, leaving only the ones you want. Given enough lines, you can give each chip an individual signature.

I asked whether physically unclonable functions (PUFs) could be used for this instead of having to add a silicon step. They said not, because (they claim) PUFs aren’t long-term stable. That surprised me, given the amount of work that’s been done on PUFs.

While there are multiple ways of implementing a PUF (including one we looked at more recently), the most common way of doing it is to segregate a portion of SRAM (or have a dedicated SRAM) and use the power-up state as the signature. One company we’ve looked at before, Intrinsic ID, uses this approach, so I thought I’d check with them on this reliability question.

I won’t go into detail here – that will be for a future discussion – but Intrinsic ID says that, yes, reliability is a real question, but that they have an answer that literally reverses the increase in signature noise that can come with aging – even reducing it below the noise in fresh silicon. That said, Irdeto isn’t currently using that approach.

The Flow

So… if you’re a coder, how do you do this? As mentioned, it’s a build-time thing, with up-front planning and with a possible additional manufacturing step.

First you need to know what areas are most important for protection. This is where you do the planning.
You develop and test your code in the clear.
Then, at build time, you run the tools to create code chaos.
You then need to retest.
If you’re personalizing per-chip, then you must give each chip an identity, load the code, and store the transaction. This is very similar to key provisioning.

About that retesting thing: the usual way to prove that something transformed works the same as the original is through equivalence checking, which leverages formal analysis. That doesn’t work in these cases (or, at least, that’s not what they do). Instead, you remember that huge regression suite you used to test your original clear-text code? You repeat the testing on the transformed code.

That can create quite a lot of testing when you start applying diversity, since there can be a lot of different versions of the same thing. If there is literally a random seed that tailors a unique version for each individual chip on which it will run, then testing of each version isn’t practical. So, presumably, the personalization transformations will need to be on a level that can be guaranteed without a full reg test run.

For obvious reasons, there was only so much that Irdeto could tell me. There were questions I would ask that would elicit quick discussions between the various folks at the table to see what they could say. So I presume that I have but scratched the surface of this thing. You’ll need to check in with them if you’re interested in learning more.

More info:

Irdeto