Security: Hard and Soft

“We will bankrupt ourselves in the vain search for absolute security.” – Dwight D. Eisenhower

I hate writing about security. I hate it because I wish it were unnecessary. There was a time when engineering meant making a product that did what you wanted it to do. Now it means spending a bunch of time making it not do what other people want but you don’t want. This sucks.

Most of the problem with implementing security features is guessing where the vulnerabilities are. How do you fix a bug you’ve never even thought of, much less identified? At least “real” hardware bugs are unintentional. Security hacks are both deliberate and malicious. Someone is trying to break your stuff.

That’s why products like Tortuga Logic’s Sentinel are so handy. It automates much of the tedious task of identifying where the security holes in your hardware lie. Sentinel doesn’t fix anything; it’s purely a diagnostic tool. But, for many of us, that’s the most important part.

Sentinel comes in two parts. It’s a scripting/design language, and it’s an execution model. You describe how you think the security features of your new SoC are supposed to work, and then the Sentinel model humiliates you in front of your colleagues. Sound like fun? Hey, it’s better than finding out in the newspapers, after you’ve shipped a thousand units.

Sentinel doesn’t look for software bugs. It’s purely a chip-level RTL diagnostic tool. It currently works with the Synopsys flow, with Cadence and Mentor versions coming later. You license Sentinel, much like any EDA tool, and use it to diagnose potential security holes in your chip design.

What kinds of holes? Sentinel performs what’s called an “information flow analysis,” looking for how and where data is transported across your chip. Can this CPU core talk to that DSP core, or to that JTAG interface over there? Fine, but let’s make sure that those transactions happen only under the right circumstances. Most chip-level internal buses (AMBA, Sonics, etc.) understand the concept of privilege levels, but Sentinel makes sure that those privileges are enforced all the time, not just during routine operation. Boot-up, configuration, and testing are three specific periods where privileges are often relaxed – to the detriment of the product’s security.

Sentinel is sort of like reverse RTL, in the sense that it describes what a circuit should not do, rather than how it should function. For example, you can tell Sentinel that an encryption key stored somewhere on the chip should not, under any circumstances, ever flow to the output pins of the chip. Once Sentinel knows that this should never happen, it can rapidly look for cases where it might.

This kind of analysis is particularly useful when you’re integrating third-party IP – which almost everyone does. You’re either not able to examine the licensed RTL, or, if you are, it’s too inscrutable to be useful. Either way, you’re trusting the IP vendor to supply you with secure circuitry, as well as a secure method of using it. Sentinel treats everyone’s RTL equally, so flaws in licensed IP are highlighted just as enthusiastically as your own.

Security is no fun. Trying to find potential security holes is even less fun. Staring at schematics or reading RTL line-by-line hoping you’ll suddenly discover a potential security flaw is the least fun of all, and incomplete and error-prone to boot. Sentinel may not make engineering any more fun, but it should take some of the un-fun out of the job.

Meanwhile, over in software land, a team of university researchers (this time working without any penguin assistance) claims to have discovered a glaring security hole in just about every processor in use today. Oh, joy.

First, the problem: A common security feature is called address-space layout randomization (ASLR). With ASLR, you move programs and data around in memory, never loading or executing them in the same place twice. This helps to prevent a whole category of hacks that rely on known addresses. Any CPU chip with an MMU (memory-management unit) can make ASLR easy to implement. You just configure the MMU to scatter sections of the program around to pseudo-random addresses in memory, and the resulting tangle becomes much harder to crack. The MMU makes this all-but-invisible to the programmer, and there’s virtually no overhead or performance penalty.

But.

There may be a problem. Nearly all MMUs use page tables, which are memory-resident structures that describe how virtual addresses should be translated into physical addresses. In short, they’re big lookup tables that programmers get to create at boot-up time. Because an MMU’s logical-to-physical address translation is in the critical performance path, those page tables are usually cached. Without a page-table cache, the MMU would have to look up the page table data from memory every time the chip reads or writes to RAM – a huge waste of time. So, the page tables get cached. So far, so good.

But.

Anything that’s cached can also be pushed out of the cache. That’s how caches work. Which means that evicted page-table data gets pushed out to memory occasionally. Which means you can read it. Which means the bad guys can determine what the logical-to-physical translation really is, which means our code and data might not be as scrambled as we thought. Oops.

Worse, those five researchers at the University of Amsterdam have demonstrated the weakness of ASLR using nothing more elaborate than JavaScript. That’s right: no low-level assembly code was required, no application-specific hack, no spooky RF equipment for side-channel monitoring, and no foreknowledge of the program(s) they were deconstructing. In just over two minutes, their JavaScript hack reconstructed the actual address map of the application code and its data. It even works on different processor architectures, specifically x86 and ARM.

As the researchers point out, MMU page tables are sensitive data. Or at least, sensitive when they’re used to implement data- or code-scrambling techniques like ASLR. Page tables should never be exposed to the outside world, and, in most cases, they aren’t. Any decent programmer knows how to protect the page tables (and other MMU structures) by putting them in a privileged address space that only highly secure code can access. And, if the processor allows it, they’ll often disable read accesses to that space entirely, making the page tables invisible to everyone but the operating system kernel.

So why doesn’t that solve the problem? The MMU has its own internal cache (called a TLB, for translation lookaside buffer) for page-table entries, separate from the normal L1, L2, and (optionally) L3 caches that the processor uses for code and data. But – and this is the key – those page-table entries get flushed out of the TLB just like any other data, which means they pass through the unified L3 cache on their way out to memory. That’s by design. If the MMU needs to fetch those page-table entries again soon, they can be supplied by the L3 cache, rather than by a slow read from physical memory. That’s what caches are for.

And this presents a vulnerability. If you can force the page-table entries out into the open, you can examine them and reverse-engineer their logical-to-physical address mapping, opening up avenues for conventional software-based attacks like buffer overflow or code injection. The trick is forcing the TLB to flush its contents in a way that you can analyze.

This turns out to be difficult, and the research paper details the techniques they used. One challenge was to come up with JavaScript that could tell the difference between cache hits and cache misses. Ideally, you could time the memory accesses; a quick response means a cache hit, while a slower access time means a miss and a read from physical memory. Unfortunately for them (fortunately for us), most operating systems and browsers disable fine-grained timers specifically to prevent exactly this kind of attack. So, the researchers had to create their own software timer.

Second, they had to force the caches to empty themselves, thus exposing their contents. This, too, turned out to be complicated, but they eventually created various cache-busting data sets specific to the various MMUs being tested.

Finally, they had to derandomize the caches so they’d be in a known state. Only then could they probe the caches to see what caused a cache hit or a miss.

In all, the group tested 11 different ARM and x86 processors and two browsers (Firefox and Chrome), running on both Windows 10 and Linux operating systems. Everything toppled over, usually in about two minutes. As they state in their research paper, “We did not find an architecture on which… the attack was not possible.” Ugh.

Is there any hope of mitigating this new attack vector? Not much, they conclude. You could disable browser or OS timers completely, but that’s drastic and could probably be sidestepped anyway. You could also design a CPU with separate caches just for the MMU, but that’s also a major job and not a quick fix.

This job gets more complicated all the time.