Keeping Your IoT Secrets Secret

[Editor’s note: this is the sixth in a series on Internet-of-Things security. You can find the introductory piece here and the prior piece here.]

We’ve looked at how Internet of Things (IoT) machines can greet each other and exchange secret passwords to prove that they’re legit. We’ve looked at how to scramble data so that it becomes nothing more than so much gibberish unless you’re entitled to read it. And we’ve looked at the, well, key piece of data that enables all of this: the key.

We’ve seen different ways to generate a key; we’ve seen different ways to use keys. But there’s one thing we haven’t looked at: how to protect the key. After all, if someone can find the key, then suddenly all the secrets aren’t so secret anymore.

Let’s take a really simplistic situation. You want to encrypt some data, and you need a key for that. So you generate a key that you’ve decided to use forever. Which means it needs to survive a power-down. So you save it on your hard drive. Or on the non-volatile memory (NVM) on your embedded system.

When you need it, you simply go retrieve it and do the calculations to encrypt data using the processor on your embedded system. Easy, right?

Yeah, easy for everyone – including the guy trying to get the key. The key is stored in plain text (unencrypted); it’s moving about on the main system bus; and it’s being utilized in an unprotected computing space visible to anyone with a half-decent debugger. That key is easily snooped, and, because you wanted to use it forever, you’re stuck.

So many things wrong with that scenario – where to start? It’s so bad as to be a bad example. No one would be that reckless, right? It’s the more subtle scenarios, where you think you’re protecting the key, that can trip you up. Things you may not think of – like how do you obtain the key in the first place?

Preparing a Safe Space

Before we get a key, let’s do like any good couple expecting a newborn would do: make sure we have a place to put it first.

Obviously, storing a key in NVM or a hard drive unencrypted is really, really dumb. OK, so, encrypt it, right? Only… you need another key for that. OK, store that elsewhere – encrypted. Um… only now you need yet a third key for that – which needs to go somewhere. And you can see that this is a losing game. Situations like this do exist, where the key chain eventually terminates with a (hopefully) well-hidden secret.

Sound foolhardy? Well, if you think about all the lengths to which folks will go to intercept a key, then any storage of the key anywhere needs to be encrypted, triggering this unending encrypt-the-key chain. Which means there’s probably a secret somewhere to get that series of unencryptions going. It’s like buying that big bag of animal feed and trying to find that one string that, when you pull, will undo the knot that binds the bag shut. Not so easy. (But satisfying when you do…)

The other thing about storing even an encrypted key out in the open is that someone will be able to find it. They may not know the key – yet – but if they know where to look for it, then they can spend lots of time trying to crack it.

Although… why bother with that effort? If someone can snoop the bus, then they can watch as you retrieve the key when you need it, and then they get to watch as you unencrypt it and stash it in working memory for use while working with it. Bingo!

So we’ve got two problems here: we’re storing the key in a place where it’s accessible (if temporarily unreadable), and we’re then working with that key out in the open. That kind of transparency isn’t helpful. But how to avoid it?

We have to store the key and we have to work with it. Those are inescapable. But there’s nothing that says that we have to store and work with the key using the same resources that we use to store and work with other, less-sensitive data. This is the perfect candidate for offloading: dedicate a piece of hardware to do this. (If you must do it in software, then some obfuscation may help to hide the secrets being computed…)

Crypto-engines have existed for a long time. Encryption is compute-intensive, so having the algorithm cast into hardware can be a performance-saver. But we’re talking about more than that here. We’re talking about a walled castle for the key. A kind of Vatican where important stuff goes on inside, external clues to which are limited to puffs of smoke.

This type of device exists in a number of different guises, with different names (to which we’ll return in a moment). But the idea is this: the key is stored (we’ll come back to how it gets there) and all computations involving the key are done there as well. That means that our delicate key never need face the full sunlight.

Because the key is never needed by the rest of the system, it never shows up on the bus, so it can’t be snooped. This type of device has no memory interface, so the key remains out of electronic reach. Which means you have to get desperate to get at it. For instance, if the key is stored in some sort of NVM, then why not delaminate the top of the package, carefully, and power the device up so that you can probe to detect the floating gate states?

OK, so you encrypt the key when storing so that, if someone does that, then they have to figure out how to unencrypt the key. But that also means you need an internal way to unencrypt the key, and here we go again with the key chain thing. But… it helps.

In case you think such an attack desperate, you have no idea. There’s a whole class of so-called side-channel attacks. They’re called that because the attack doesn’t come via any of the ports or pathways through which the device communicates with others. Instead, you might measure the current in the power line. Or put an antenna around the device to detect faint signals that give away what’s going on inside.

Yeah, people do that, and they can learn things. Amazing. So physical protections against this are becoming a thing. For instance, when doing the chip metallization, they create a labyrinth of metal that covers the entire chip (except for the narrow lanes between the labyrinth paths). This forms a shield that blocks those EM signals that the dude with the antenna is snooping.

It also makes it very hard to see the chip itself, which makes it harder to examine the NVM for traces of the key. Then again, someone could give up on the side-channel and delaminate that metal to restore the view of the NVM, right? Not with many of the modern chips: that shield isn’t just passive metal. It’s active, and if it gets disturbed – by someone trying to remove it – then the chip erases or scrambles or otherwise turns the key into mush. It’s a crypto-scorched-earth policy. Or the chip’s cyanide capsule.

As to measuring power signals to infer operation and the key, chips are being designed with specific protections against that – you might see things like “DPA (differential power analysis) countermeasures” on the datasheet. The idea here is that you effectively dither the power signal internally to scramble any useful information that it might have been carrying. Of course, this has to be done without blowing up the power budget…

How did you get in there?

OK, so let’s review: we’ve got a place to store our key, and we’ve encrypted it and covered it in metal and added tamper protections to thwart the many attacks that might be launched to uncover the key. It’s also got a computation platform that can execute any of the various algorithms – encryption, key exchange, signing, etc. – that will involve the key. The key need never – in fact, can never – escape this chip. (It should go without saying that the chip should have no extra debug ports that provide access to the internal bus… that would be really dumb.)

These chips also typically have space to store other secrets: certificates and such. They have an interface through which secrets can be loaded but cannot be read. Once inside, they’re used only internally for various crypto-tasks.

That leaves one big question: how does the key get in there in the first place? This may sound like a trivial issue, but it’s far from. If you’re a major manufacturer designing and building your own system, you might think that you can simply load the key during manufacturing and test. What’s so hard about that?

Except that most companies don’t build their own stuff. Most high-volume manufacturing is done by contract houses in far-off lands. Using such a simplistic approach would mean giving these total strangers (yeah, I know they’re “partners”… ahem…) all of the keys to your systems. After all that work protecting the key in the chip, you might as well just post the key on the internet.

There are a couple of classic problems with this scenario. And it may sound uncharitable to manufacturers, but this stuff has happened. One scenario is so-called over-building. A legit company like Apple orders 10,000 units built; the manufacturer builds 11,000 and sells the extra thousand out the back door for a nifty little extra profit.

This could easily happen if the approach to loading keys were simply to generate a random number during test and load it in as the key. So that’s not a good way to do it. This is going to sound like a nightmare to manage, but the way around this is to maintain a database of keys. As you’re manufacturing, you make note that the chip with so-and-so serial number got such-and-such key.

This makes over-building harder, since, if the database has 11,000 entries when only 10,000 systems were ordered, it’s pretty easy to tell that there’s a problem. The database is then shared with the company that designed the system for future use, when systems using the key are plugged in and attempt to join the network. If extra keys and devices were built without logging them, then their keys won’t be in the database, and those devices won’t be allowed onto the network. Note, however, that this database now has a giant target painted on it: if you can hack the database, then you’ve won the IoT.

There’s one other problem that this doesn’t solve: that of so-called cloning. If you over-build by 1000 devices using random keys, the database may thwart you eventually. But what if you take legit devices with legit keys and then clone them? Two devices will have the exact same key. Now, if devices are serialized, then the database will tell which device ID is the legitimate one. But that just pushes the problem around, since something has to load the serial number as well – why not clone it too?

And besides, while we’re at it, why not attempt to snoop the tester or whatever computer is downloading the keys? Even if you can’t overbuild, it’s still useful to have a key to a legit system. You can get the serial numbers that way too.

This threat is partly addressed by using a special piece of equipment on the test or manufacturing floor for use in loading any secrets. It’s called a “hardware secure (or security) module,” or HSM. It’s specially designed to store, protect, and manage secret information – and to generate keys and such for loading onto new devices. They’re pretty expensive – 4 digits or so – so they’re not great for new startups on a budget trying to get their prototypes working, but they are essential for high-volume manufacturing floors. (Some companies, like Atmel, have low-volume kits to cover the low end of the market.)

The HSM addresses the threat of snooping the tester. And, if the key-loading process is sophisticated enough, then it also helps with cloning. In other words, if the only practical way of loading a key is via the HSM, then the HSM will prevent the same key being loaded twice. If it’s impossible to read the key after loading, then you can’t take one device and manually load its key onto another – because you can’t read the key.

Even so, this whole HSM/database thing is cumbersome. It would be nice if there were an easier way that was still secure. And, in fact, there is – and it’s even more secure. In order to describe it, let’s briefly define a couple of watershed moments in the lifecycle of a security chip. First, there’s manufacturing and testing – obvious. Next is “provisioning” – that’s when keys and other secrets are loaded, so the device is poised for joining a network, but it hasn’t yet. Finally comes “commissioning” or “on-boarding”: this is when a system with that security chip attempts to join a network and register with a service. It’s when that database comes full-circle.

With an HSM, provisioning happens along with testing (although it could be made a separate step). The big step away from using the HSM to load the key? Have the chip generate its own key.

If you happened to be around a few years ago, we looked at one way in which this could be done: the physically-unclonable function, or PUF. This takes advantage of various sources of randomness (like SRAM power-up state) to generate a random number. Great care must be taken on such chips to ensure a good source of randomness – also referred to as high entropy. If ¾ of the chips ended up with the same key because entropy was poor, that wouldn’t work so well.

When provisioning, such a chip is given a command to generate a key. It uses whatever entropy source it’s been designed with to generate a key-of-all-keys. That key is the most secret of secrets, the holiest of holies. Very often, that key will itself never be used in any actual transaction. In fact, it will be used as a seed for generating session (or ephemeral) keys for single-session use, so if anyone gets the key during an actual transaction, then they’ve gotten exactly one key from exactly one session on exactly one machine. Not particularly useful once the session is over or the power is off. And some chips have measures to ensure that a specific session key is never generated twice.

But the root key underlying it all? No one can ever see it. It’s not queriable, it’s not viewable (especially with physical protections like the metal shield and DPA countermeasures), and there’s no path by which it could be coaxed out into the light. It lives its life in the total blackness of a polar winter. (Minus the aurora.)

During commissioning, there’s no need to check a database. It’s impossible (or nearly so) to clone such a system. And you don’t need an HSM to load it. (Although you may still need one to load certificates or other less-holy-but-still-holy secrets.)

By any other name

OK, so we’ve looked at how we can not only protect a key, but even generate a key as needed. So where does one go to get a magical device like this?

There are several different incarnations of such a device, with variations on exactly what capabilities they have. The biggest distinction is whether they create their own keys: many do not, relying on the HSM approach; others do.

Some of these have names, and you’d never really know that they’re the same thing because they grew up in totally different vertical markets with different requirements. At the very least you have:

TPM: trusted platform module; we’ve seen this before. It grew out of the computer market. They can be purchased from companies like Infineon, Atmel, Broadcom (integrated in their controllers), and Nuvoton.
Secure element (SE): this is a small chip designed for the smart card market. It must be very inexpensive and very small in order to reside in that tiny form factor while securing financial transactions. The architecture has been standardized by GlobalPlatform (who also standardizes the trusted execution environment, or TEE). It typically runs on a way-cut-down Java version called Java Card. They’re available from companies like Gemalto and NXP.
HSM: yup, the device used to provision security devices is itself a security device, intended to hold and manage secrets. This is a larger form-factor, higher-cost implementation. It’s available from companies like Safenet and Thales.
There are other chips that perform this function but don’t fall into one of the above categories. There’s the Kerkey from ST, intended for smart meters. And the IoT in general demands a low-cost, low-power solution for edge nodes, so these are becoming more available. One example is Atmel’s ATECC508A. It’s not technically an SE or an HSM or a TPM, but it generates its own key and then performs elliptic curve cryptography functions in an opaque fashion, far from prying eyes.
Meanwhile, other bits of IP exist, like Microsemi’s FPGA cores with strong DPA countermeasures.

Capabilities you’ll see touted in such devices are:

High entropy, including high-quality true random number generators (TRNGs)
Anti-tamper features
Side-channel protections
Serialization
Single-use session keys
Space for multiple keys or other artifacts (which lets you use different keys for different services, further diminishing the utility of snatching a single key)
Absence of any physical back doors (like debug ports)

Finally, our focus has been on asymmetric public/private keys. If your system is going to rely on symmetric private keys, then you must load keys explicitly to ensure that they’re all the same. Not only are the stakes higher if the key is found out, but many of the above key protection features (like auto-generation) are no longer available.

[Editor’s note: the next and final article in the series, on secure design processes, can be found here.]

More info:

Microsemi Security

Nuvoton TPM

NXP SE

Safenet HSM