feature article
Subscribe Now

Spectre Bug Rears Its Head Again

Academic Paper Outlines Another Variation of the CPU Security Flaw

Boo! A scary new variation of the Spectre CPU bug has surfaced, and it may be resistant to the fixes and countermeasures already deployed. Or maybe not. 

A band of CS/EE students has published a paper, provocatively dubbed “I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches,” claiming to reveal another new way to siphon sensitive information out of x86 processors. It looks like yet another variation of the well-known Spectre bug, but in addition to detailing the discovery itself, the paper also reports that this version isn’t fixable. 

There are now so many variations of Spectre that it’s become its own brand. Several fixes and countermeasures have been deployed since it was first discovered three years ago, but this new strain is fix-resistant – at least, according to its discoverers. 

Intel disagrees, and says that its currently published guidelines will squash it very nicely, thank you very much. AMD agrees. Both companies are saying, in essence, go back to your homes, programmers, there’s nothing to see here. 

First, some background. As with all the previous versions of Spectre, this one depends on extremely subtle side effects in the way that microprocessors fetch and execute software instructions. Note that Spectre is not an Intel bug or an AMD bug. It’s not even an x86 bug. It’s a complexity bug. Spectre has affected nearly all modern high-end microprocessors, including those designed by ARM. Only this latest version is x86-specific. 

Such chips all perform speculative execution, meaning they sometimes guess what they’re supposed to do next. When they guess right – and most of the time, they do – they save a bunch of time. When they occasionally guess wrong, they discard the incorrect results and start over with the right instructions. All of this is invisible to software, even to those of us who dabble in assembly-language programming. 

In the initial versions of Spectre, attackers leveraged the effect that speculative execution had on the chip’s data caches. This new one instead exploits the chip’s micro-operations cache, an even more deeply hidden feature that helps speed up instruction decoding. Today’s x86 processors don’t actually execute x86 instructions natively. Instead, each x86 instruction is decomposed into simpler micro-operations (µops) that look like generic RISC processor instructions. This translation is deliberately hidden from us, but some details of its operation can be inferred. 

Some x86 instructions are pretty simple, while some are insanely complex. That means some x86 instructions translate into just one µop, some translate into a few µops, and some translate into a whole bunch of µops. Not surprisingly, cracking apart the complex instructions takes longer than translating the simple ones. To speed things up, those translations are kept in a µop cache, which can hold a few hundred to a few thousand µops, depending on the processor. Chipmakers tend to be pretty secretive about the whole translation process and the details of the µop cache. 

But we do know that the µop cache is probabilistic: sometimes it holds the information you want, and sometimes it doesn’t. You can’t ever be sure. If the correct x86-to-µop translation data is in the cache, great! If not, no big deal, the chip will look it up. At most, you’ve lost a few clock cycles. But it is precisely this timing difference that the researchers’ new variation exploits. 

We also know that, like all caches, the µop cache is organized into cache lines, sets, and ways, and that it uses an associative mapping mechanism. All of this means that you could – with a lot of effort – organize instructions in memory in a way that perfectly matches the µop cache’s internal organization, and all of that data would be cached. You could also do the opposite and create a perfect cache-thrasher. After a lot of experimentation, the university researchers created code streams that are ideal examples of both. 

Using fine-grained performance monitors, they were able to measure the time difference between code that was in the µop cache and code that wasn’t. “This allows us to obtain a clearly distinguishable binary signal (i.e., hit vs. miss) with a mean timing difference of 218.4 cycles and a standard deviation of 27.8 cycles, allowing us to reliably transmit a bit (i.e., one-bit vs. zero-bit) over the micro-op cache.” 

That’s swell, but how is it useful? And how does it constitute a security threat? You can use those timing differences to leak information from one program to another, even when the two are supposedly independent of each other. Like previous Spectre exploits, this requires two programs working in parallel. One runs the deliberate cache-thrasher loop (while timing it), while the other runs the same loop. The two programs will interfere with each other to keep the µop cache spilling and refilling, which the first program can detect by timing. Alternatively, the second program can run a deliberately benign loop that won’t thrash the µop cache, which the first program can also detect. In this way, the two programs can communicate one bit at a time. Thrashing (slower execution) is a 1, and not thrashing (faster execution) is a 0. 

Tedious, but effective. Even leaking just one bit at a time, that’s still a lot of data at GHz clock speeds. The team also found that they don’t have to manipulate the entire µop cache to make this work. A subset works, too. “We reach our best bandwidth (965.59 Kbps) and error rates (0.22%) when six ways of eight sets are probed, while limiting ourselves to just five samples. We further report an error-corrected bandwidth by encoding our transmitted data with Reed-Solomon encoding that inflates file size by roughly 20%, providing a bandwidth of 785.56 Kbps with no errors.” That’s close to a Mbit/sec of uncompressed data. Yikes. 

As with previous Spectre variations, this one doesn’t tell you how to get the exploit into a target computer, only how to exfiltrate data out once it’s there. 

The research paper concludes with some suggested countermeasures, including flushing the µop cache at frequent intervals. That works, but it also harms performance. It’s also not something most operating systems or hypervisors are designed to do. And it leaves the security up to software, which might itself be compromised. Finally, there’s no “right answer” as to how frequently you should flush the µop cache. More is better, but how much is enough? 

Intel and AMD both say that you don’t have to do any of this – that it’s a solved problem. In similar official statements, both companies said that their existing guidelines for mitigating side-channel attacks will work in this case, too. Those guidelines rely on constant-time coding, which is pretty much what it sounds like. Instead of writing functions to operate as quickly as possible, you write them to run for a fixed amount of time. That’s harder than it sounds, because it’s counterintuitive for most programmers and because it takes practice to do properly. Experts in cryptography – and that’s essentially what this is, cryptography – often rely on constant-time code because it hides internal shortcuts that can reveal secrets. Since Spectre is a timing-based attack, those countermeasures should work here, too. 

I applaud the university team for discovering this weakness, and for putting in the hours it must have taken to characterize and document. The paper comes across as a little hysterical for an academic paper, however, almost as if it were calculated to capture headlines (which it did). Yes, we have a bug in a large number of high-performance processors. No, we’re not all in danger of immediate computer meltdown. 

At this level of complexity, there will always be bugs. (Heck, I’ve even wired flip-flops wrong.) The important thing is to always keep looking for them, always apply reasonable countermeasures, and always assume there’s another one over the horizon.

One thought on “Spectre Bug Rears Its Head Again”

  1. think about all the past actions predicated on the supposition that no such thing existed….always better to know

    as the wise man said, ” the truth hurts, but the lies’ll kill you “

Leave a Reply

featured blogs
Sep 20, 2021
As it seems to be becoming a (bad) habit, This Week in CFD is presented here as Last Week in CFD. But that doesn't make the news any less relevant. Great article on wind tunnels because they go... [[ Click on the title to access the full blog on the Cadence Community si...
Sep 18, 2021
Projects with a steampunk look-and-feel incorporate retro-futuristic technology and aesthetics inspired by 19th-century industrial steam-powered machinery....
Sep 15, 2021
Learn how chiplets form the basis of multi-die HPC processor architectures, fueling modern HPC applications and scaling performance & power beyond Moore's Law. The post What's Driving the Demand for Chiplets? appeared first on From Silicon To Software....
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

ARC® Processor Virtual Summit 2021

Sponsored by Synopsys

Designing an embedded SoC? Attend the ARC Processor Virtual Summit on Sept 21-22 to get in-depth information from industry leaders on the latest ARC processor IP and related hardware and software technologies that enable you to achieve differentiation in your chip or system design.

Click to read more

featured paper

Detect. Sense. Control: Simplify building automation designs with MSP430™ MCU-based solutions

Sponsored by Texas Instruments

Building automation systems are critical not only to security, but worker comfort. Whether you need to detect, sense or control applications within your environment, the right MCU can make it easy. Using MSP430 MCUS with integrated analog, you can easily develop common building automation applications including motion detectors, touch keypads and e-locks, as well as video security cameras. Read more to see how you can enhance your building automation design.

Click to read more

featured chalk talk

The Wireless Member of the DARWIN Family

Sponsored by Mouser Electronics and Maxim Integrated (now part of Analog Devices)

MCUs continue to evolve based on increasing demands from designers. We expect our microcontrollers to do more than ever - better security, more performance, lower power consumption - and we want it all for less money, of course. In this episode of Chalk Talk, Amelia Dalton chats with Kris Ardis from Maxim Integrated about the new DARWIN line of low-power MCUs.

Click here for more information about Maxim Integrated MAX32665-MAX32668 UB Class Microcontroller