feature article
Subscribe Now

Security: Hard and Soft

Next Up: A Tortoise That Checks Your Chip for Flaws

“We will bankrupt ourselves in the vain search for absolute security.” – Dwight D. Eisenhower

I hate writing about security. I hate it because I wish it were unnecessary. There was a time when engineering meant making a product that did what you wanted it to do. Now it means spending a bunch of time making it not do what other people want but you don’t want. This sucks.

Most of the problem with implementing security features is guessing where the vulnerabilities are. How do you fix a bug you’ve never even thought of, much less identified? At least “real” hardware bugs are unintentional. Security hacks are both deliberate and malicious. Someone is trying to break your stuff.

That’s why products like Tortuga Logic’s Sentinel are so handy. It automates much of the tedious task of identifying where the security holes in your hardware lie. Sentinel doesn’t fix anything; it’s purely a diagnostic tool. But, for many of us, that’s the most important part.

Sentinel comes in two parts. It’s a scripting/design language, and it’s an execution model. You describe how you think the security features of your new SoC are supposed to work, and then the Sentinel model humiliates you in front of your colleagues. Sound like fun? Hey, it’s better than finding out in the newspapers, after you’ve shipped a thousand units.

Sentinel doesn’t look for software bugs. It’s purely a chip-level RTL diagnostic tool. It currently works with the Synopsys flow, with Cadence and Mentor versions coming later. You license Sentinel, much like any EDA tool, and use it to diagnose potential security holes in your chip design.

What kinds of holes? Sentinel performs what’s called an “information flow analysis,” looking for how and where data is transported across your chip. Can this CPU core talk to that DSP core, or to that JTAG interface over there? Fine, but let’s make sure that those transactions happen only under the right circumstances. Most chip-level internal buses (AMBA, Sonics, etc.) understand the concept of privilege levels, but Sentinel makes sure that those privileges are enforced all the time, not just during routine operation. Boot-up, configuration, and testing are three specific periods where privileges are often relaxed – to the detriment of the product’s security.

Sentinel is sort of like reverse RTL, in the sense that it describes what a circuit should not do, rather than how it should function. For example, you can tell Sentinel that an encryption key stored somewhere on the chip should not, under any circumstances, ever flow to the output pins of the chip. Once Sentinel knows that this should never happen, it can rapidly look for cases where it might.

This kind of analysis is particularly useful when you’re integrating third-party IP – which almost everyone does. You’re either not able to examine the licensed RTL, or, if you are, it’s too inscrutable to be useful. Either way, you’re trusting the IP vendor to supply you with secure circuitry, as well as a secure method of using it. Sentinel treats everyone’s RTL equally, so flaws in licensed IP are highlighted just as enthusiastically as your own.

Security is no fun. Trying to find potential security holes is even less fun. Staring at schematics or reading RTL line-by-line hoping you’ll suddenly discover a potential security flaw is the least fun of all, and incomplete and error-prone to boot. Sentinel may not make engineering any more fun, but it should take some of the un-fun out of the job.

Meanwhile, over in software land, a team of university researchers (this time working without any penguin assistance) claims to have discovered a glaring security hole in just about every processor in use today. Oh, joy.

First, the problem: A common security feature is called address-space layout randomization (ASLR). With ASLR, you move programs and data around in memory, never loading or executing them in the same place twice. This helps to prevent a whole category of hacks that rely on known addresses. Any CPU chip with an MMU (memory-management unit) can make ASLR easy to implement. You just configure the MMU to scatter sections of the program around to pseudo-random addresses in memory, and the resulting tangle becomes much harder to crack. The MMU makes this all-but-invisible to the programmer, and there’s virtually no overhead or performance penalty.


There may be a problem. Nearly all MMUs use page tables, which are memory-resident structures that describe how virtual addresses should be translated into physical addresses. In short, they’re big lookup tables that programmers get to create at boot-up time. Because an MMU’s logical-to-physical address translation is in the critical performance path, those page tables are usually cached. Without a page-table cache, the MMU would have to look up the page table data from memory every time the chip reads or writes to RAM – a huge waste of time. So, the page tables get cached. So far, so good.


Anything that’s cached can also be pushed out of the cache. That’s how caches work. Which means that evicted page-table data gets pushed out to memory occasionally. Which means you can read it. Which means the bad guys can determine what the logical-to-physical translation really is, which means our code and data might not be as scrambled as we thought. Oops.

Worse, those five researchers at the University of Amsterdam have demonstrated the weakness of ASLR using nothing more elaborate than JavaScript. That’s right: no low-level assembly code was required, no application-specific hack, no spooky RF equipment for side-channel monitoring, and no foreknowledge of the program(s) they were deconstructing. In just over two minutes, their JavaScript hack reconstructed the actual address map of the application code and its data. It even works on different processor architectures, specifically x86 and ARM.

As the researchers point out, MMU page tables are sensitive data. Or at least, sensitive when they’re used to implement data- or code-scrambling techniques like ASLR. Page tables should never be exposed to the outside world, and, in most cases, they aren’t. Any decent programmer knows how to protect the page tables (and other MMU structures) by putting them in a privileged address space that only highly secure code can access. And, if the processor allows it, they’ll often disable read accesses to that space entirely, making the page tables invisible to everyone but the operating system kernel.

So why doesn’t that solve the problem? The MMU has its own internal cache (called a TLB, for translation lookaside buffer) for page-table entries, separate from the normal L1, L2, and (optionally) L3 caches that the processor uses for code and data. But – and this is the key – those page-table entries get flushed out of the TLB just like any other data, which means they pass through the unified L3 cache on their way out to memory. That’s by design. If the MMU needs to fetch those page-table entries again soon, they can be supplied by the L3 cache, rather than by a slow read from physical memory. That’s what caches are for.

And this presents a vulnerability. If you can force the page-table entries out into the open, you can examine them and reverse-engineer their logical-to-physical address mapping, opening up avenues for conventional software-based attacks like buffer overflow or code injection. The trick is forcing the TLB to flush its contents in a way that you can analyze.

This turns out to be difficult, and the research paper details the techniques they used. One challenge was to come up with JavaScript that could tell the difference between cache hits and cache misses. Ideally, you could time the memory accesses; a quick response means a cache hit, while a slower access time means a miss and a read from physical memory. Unfortunately for them (fortunately for us), most operating systems and browsers disable fine-grained timers specifically to prevent exactly this kind of attack. So, the researchers had to create their own software timer.

Second, they had to force the caches to empty themselves, thus exposing their contents. This, too, turned out to be complicated, but they eventually created various cache-busting data sets specific to the various MMUs being tested.

Finally, they had to derandomize the caches so they’d be in a known state. Only then could they probe the caches to see what caused a cache hit or a miss.

In all, the group tested 11 different ARM and x86 processors and two browsers (Firefox and Chrome), running on both Windows 10 and Linux operating systems. Everything toppled over, usually in about two minutes. As they state in their research paper, “We did not find an architecture on which… the attack was not possible.” Ugh.

Is there any hope of mitigating this new attack vector? Not much, they conclude. You could disable browser or OS timers completely, but that’s drastic and could probably be sidestepped anyway. You could also design a CPU with separate caches just for the MMU, but that’s also a major job and not a quick fix.

This job gets more complicated all the time.

Leave a Reply

featured blogs
Oct 23, 2020
Processing a component onto a PCB used to be fairly straightforward. Through hole products, a single or double row surface mount with a larger center-line rarely offer unique challenges obtaining a proper solder joint. However, as electronics continue to get smaller and conne...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...
Oct 23, 2020
Any suggestions for a 4x4 keypad in which the keys aren'€™t wobbly and you don'€™t have to strike a key dead center for it to make contact?...
Oct 23, 2020
At 11:10am Korean time this morning, Cadence's Elias Fallon delivered one of the keynotes at ISOCC (International System On Chip Conference). It was titled EDA and Machine Learning: The Next Leap... [[ Click on the title to access the full blog on the Cadence Community ...

featured video

Demo: Low-Power Machine Learning Inference with DesignWare ARC EM9D Processor IP

Sponsored by Synopsys

Applications that require sensing on a continuous basis are always on and often battery operated. In this video, the low-power ARC EM9D Processors run a handwriting character recognition neural network graph to infer the letter that is written.

Click here for more information about DesignWare ARC EM9D / EM11D Processors

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

MCU32 Graphics Overview

Sponsored by Mouser Electronics and Microchip

Graphical interfaces add a whole new dimension to embedded designs. But, designing a full-blown graphics interface is a major challenge for most embedded systems designers. In this episode of Chalk Talk, Amelia Dalton and Kurt Parker from Microchip Technology explain how you can add a modern graphics user interface to your next embedded design without a big learning curve.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment