feature article
Subscribe Now

Security: Hard and Soft

Next Up: A Tortoise That Checks Your Chip for Flaws

“We will bankrupt ourselves in the vain search for absolute security.” – Dwight D. Eisenhower

I hate writing about security. I hate it because I wish it were unnecessary. There was a time when engineering meant making a product that did what you wanted it to do. Now it means spending a bunch of time making it not do what other people want but you don’t want. This sucks.

Most of the problem with implementing security features is guessing where the vulnerabilities are. How do you fix a bug you’ve never even thought of, much less identified? At least “real” hardware bugs are unintentional. Security hacks are both deliberate and malicious. Someone is trying to break your stuff.

That’s why products like Tortuga Logic’s Sentinel are so handy. It automates much of the tedious task of identifying where the security holes in your hardware lie. Sentinel doesn’t fix anything; it’s purely a diagnostic tool. But, for many of us, that’s the most important part.

Sentinel comes in two parts. It’s a scripting/design language, and it’s an execution model. You describe how you think the security features of your new SoC are supposed to work, and then the Sentinel model humiliates you in front of your colleagues. Sound like fun? Hey, it’s better than finding out in the newspapers, after you’ve shipped a thousand units.

Sentinel doesn’t look for software bugs. It’s purely a chip-level RTL diagnostic tool. It currently works with the Synopsys flow, with Cadence and Mentor versions coming later. You license Sentinel, much like any EDA tool, and use it to diagnose potential security holes in your chip design.

What kinds of holes? Sentinel performs what’s called an “information flow analysis,” looking for how and where data is transported across your chip. Can this CPU core talk to that DSP core, or to that JTAG interface over there? Fine, but let’s make sure that those transactions happen only under the right circumstances. Most chip-level internal buses (AMBA, Sonics, etc.) understand the concept of privilege levels, but Sentinel makes sure that those privileges are enforced all the time, not just during routine operation. Boot-up, configuration, and testing are three specific periods where privileges are often relaxed – to the detriment of the product’s security.

Sentinel is sort of like reverse RTL, in the sense that it describes what a circuit should not do, rather than how it should function. For example, you can tell Sentinel that an encryption key stored somewhere on the chip should not, under any circumstances, ever flow to the output pins of the chip. Once Sentinel knows that this should never happen, it can rapidly look for cases where it might.

This kind of analysis is particularly useful when you’re integrating third-party IP – which almost everyone does. You’re either not able to examine the licensed RTL, or, if you are, it’s too inscrutable to be useful. Either way, you’re trusting the IP vendor to supply you with secure circuitry, as well as a secure method of using it. Sentinel treats everyone’s RTL equally, so flaws in licensed IP are highlighted just as enthusiastically as your own.

Security is no fun. Trying to find potential security holes is even less fun. Staring at schematics or reading RTL line-by-line hoping you’ll suddenly discover a potential security flaw is the least fun of all, and incomplete and error-prone to boot. Sentinel may not make engineering any more fun, but it should take some of the un-fun out of the job.

Meanwhile, over in software land, a team of university researchers (this time working without any penguin assistance) claims to have discovered a glaring security hole in just about every processor in use today. Oh, joy.

First, the problem: A common security feature is called address-space layout randomization (ASLR). With ASLR, you move programs and data around in memory, never loading or executing them in the same place twice. This helps to prevent a whole category of hacks that rely on known addresses. Any CPU chip with an MMU (memory-management unit) can make ASLR easy to implement. You just configure the MMU to scatter sections of the program around to pseudo-random addresses in memory, and the resulting tangle becomes much harder to crack. The MMU makes this all-but-invisible to the programmer, and there’s virtually no overhead or performance penalty.

But.

There may be a problem. Nearly all MMUs use page tables, which are memory-resident structures that describe how virtual addresses should be translated into physical addresses. In short, they’re big lookup tables that programmers get to create at boot-up time. Because an MMU’s logical-to-physical address translation is in the critical performance path, those page tables are usually cached. Without a page-table cache, the MMU would have to look up the page table data from memory every time the chip reads or writes to RAM – a huge waste of time. So, the page tables get cached. So far, so good.

But.

Anything that’s cached can also be pushed out of the cache. That’s how caches work. Which means that evicted page-table data gets pushed out to memory occasionally. Which means you can read it. Which means the bad guys can determine what the logical-to-physical translation really is, which means our code and data might not be as scrambled as we thought. Oops.

Worse, those five researchers at the University of Amsterdam have demonstrated the weakness of ASLR using nothing more elaborate than JavaScript. That’s right: no low-level assembly code was required, no application-specific hack, no spooky RF equipment for side-channel monitoring, and no foreknowledge of the program(s) they were deconstructing. In just over two minutes, their JavaScript hack reconstructed the actual address map of the application code and its data. It even works on different processor architectures, specifically x86 and ARM.

As the researchers point out, MMU page tables are sensitive data. Or at least, sensitive when they’re used to implement data- or code-scrambling techniques like ASLR. Page tables should never be exposed to the outside world, and, in most cases, they aren’t. Any decent programmer knows how to protect the page tables (and other MMU structures) by putting them in a privileged address space that only highly secure code can access. And, if the processor allows it, they’ll often disable read accesses to that space entirely, making the page tables invisible to everyone but the operating system kernel.

So why doesn’t that solve the problem? The MMU has its own internal cache (called a TLB, for translation lookaside buffer) for page-table entries, separate from the normal L1, L2, and (optionally) L3 caches that the processor uses for code and data. But – and this is the key – those page-table entries get flushed out of the TLB just like any other data, which means they pass through the unified L3 cache on their way out to memory. That’s by design. If the MMU needs to fetch those page-table entries again soon, they can be supplied by the L3 cache, rather than by a slow read from physical memory. That’s what caches are for.

And this presents a vulnerability. If you can force the page-table entries out into the open, you can examine them and reverse-engineer their logical-to-physical address mapping, opening up avenues for conventional software-based attacks like buffer overflow or code injection. The trick is forcing the TLB to flush its contents in a way that you can analyze.

This turns out to be difficult, and the research paper details the techniques they used. One challenge was to come up with JavaScript that could tell the difference between cache hits and cache misses. Ideally, you could time the memory accesses; a quick response means a cache hit, while a slower access time means a miss and a read from physical memory. Unfortunately for them (fortunately for us), most operating systems and browsers disable fine-grained timers specifically to prevent exactly this kind of attack. So, the researchers had to create their own software timer.

Second, they had to force the caches to empty themselves, thus exposing their contents. This, too, turned out to be complicated, but they eventually created various cache-busting data sets specific to the various MMUs being tested.

Finally, they had to derandomize the caches so they’d be in a known state. Only then could they probe the caches to see what caused a cache hit or a miss.

In all, the group tested 11 different ARM and x86 processors and two browsers (Firefox and Chrome), running on both Windows 10 and Linux operating systems. Everything toppled over, usually in about two minutes. As they state in their research paper, “We did not find an architecture on which… the attack was not possible.” Ugh.

Is there any hope of mitigating this new attack vector? Not much, they conclude. You could disable browser or OS timers completely, but that’s drastic and could probably be sidestepped anyway. You could also design a CPU with separate caches just for the MMU, but that’s also a major job and not a quick fix.

This job gets more complicated all the time.

Leave a Reply

featured blogs
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 20, 2022
This week was the 11th Embedded Vision Summit. So that means the first one, back in 2011, was just a couple of years after what I regard as the watershed event in vision, the poster session (it... ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Intel® Agilex™ M-Series with HBM2e Technology

Sponsored by Intel

Intel expands the Intel® Agilex™ FPGA product offering with M-Series devices equipped with high fabric densities, in-package HBM2e memory, and DDR5 interfaces for high-memory bandwidth applications.

Learn more about the Intel® Agilex™ M-Series

featured paper

Intel Agilex FPGAs Deliver Game-Changing Flexibility & Agility for the Data-Centric World

Sponsored by Intel

The new Intel® Agilex™ FPGA is more than the latest programmable logic offering—it brings together revolutionary innovation in multiple areas of Intel technology leadership to create new opportunities to derive value and meaning from this transformation from edge to data center. Want to know more? Start with this white paper.

Click to read more

featured chalk talk

Tame the SiC Beast - Unleash the Full Capacity of Silicon Carbide

Sponsored by Mouser Electronics and Microchip

Wide band gap materials such as silicon carbide are revolutionizing the power industry. At the same time, they can also introduce byproducts including overheating, short circuits and over voltage. The question remains: how can we use silicon carbide without those headache-inducing side effects? In this episode of Chalk Talk, Amelia Dalton chats with Rob Weber from Microchip about Microchip’s patented augmented switching technology can make those silicon carbide side effects a thing of the past while reducing our switching losses up to 50% and accelerating our time to market as well.

Click here for more information about the Microsemi / Microchip AgileSwitch® ASDAK+ Augmented Switching™ Dev Kit