feature article
Subscribe Now

Spectre Bug Rears Its Head Again

Academic Paper Outlines Another Variation of the CPU Security Flaw

Boo! A scary new variation of the Spectre CPU bug has surfaced, and it may be resistant to the fixes and countermeasures already deployed. Or maybe not. 

A band of CS/EE students has published a paper, provocatively dubbed “I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches,” claiming to reveal another new way to siphon sensitive information out of x86 processors. It looks like yet another variation of the well-known Spectre bug, but in addition to detailing the discovery itself, the paper also reports that this version isn’t fixable. 

There are now so many variations of Spectre that it’s become its own brand. Several fixes and countermeasures have been deployed since it was first discovered three years ago, but this new strain is fix-resistant – at least, according to its discoverers. 

Intel disagrees, and says that its currently published guidelines will squash it very nicely, thank you very much. AMD agrees. Both companies are saying, in essence, go back to your homes, programmers, there’s nothing to see here. 

First, some background. As with all the previous versions of Spectre, this one depends on extremely subtle side effects in the way that microprocessors fetch and execute software instructions. Note that Spectre is not an Intel bug or an AMD bug. It’s not even an x86 bug. It’s a complexity bug. Spectre has affected nearly all modern high-end microprocessors, including those designed by ARM. Only this latest version is x86-specific. 

Such chips all perform speculative execution, meaning they sometimes guess what they’re supposed to do next. When they guess right – and most of the time, they do – they save a bunch of time. When they occasionally guess wrong, they discard the incorrect results and start over with the right instructions. All of this is invisible to software, even to those of us who dabble in assembly-language programming. 

In the initial versions of Spectre, attackers leveraged the effect that speculative execution had on the chip’s data caches. This new one instead exploits the chip’s micro-operations cache, an even more deeply hidden feature that helps speed up instruction decoding. Today’s x86 processors don’t actually execute x86 instructions natively. Instead, each x86 instruction is decomposed into simpler micro-operations (µops) that look like generic RISC processor instructions. This translation is deliberately hidden from us, but some details of its operation can be inferred. 

Some x86 instructions are pretty simple, while some are insanely complex. That means some x86 instructions translate into just one µop, some translate into a few µops, and some translate into a whole bunch of µops. Not surprisingly, cracking apart the complex instructions takes longer than translating the simple ones. To speed things up, those translations are kept in a µop cache, which can hold a few hundred to a few thousand µops, depending on the processor. Chipmakers tend to be pretty secretive about the whole translation process and the details of the µop cache. 

But we do know that the µop cache is probabilistic: sometimes it holds the information you want, and sometimes it doesn’t. You can’t ever be sure. If the correct x86-to-µop translation data is in the cache, great! If not, no big deal, the chip will look it up. At most, you’ve lost a few clock cycles. But it is precisely this timing difference that the researchers’ new variation exploits. 

We also know that, like all caches, the µop cache is organized into cache lines, sets, and ways, and that it uses an associative mapping mechanism. All of this means that you could – with a lot of effort – organize instructions in memory in a way that perfectly matches the µop cache’s internal organization, and all of that data would be cached. You could also do the opposite and create a perfect cache-thrasher. After a lot of experimentation, the university researchers created code streams that are ideal examples of both. 

Using fine-grained performance monitors, they were able to measure the time difference between code that was in the µop cache and code that wasn’t. “This allows us to obtain a clearly distinguishable binary signal (i.e., hit vs. miss) with a mean timing difference of 218.4 cycles and a standard deviation of 27.8 cycles, allowing us to reliably transmit a bit (i.e., one-bit vs. zero-bit) over the micro-op cache.” 

That’s swell, but how is it useful? And how does it constitute a security threat? You can use those timing differences to leak information from one program to another, even when the two are supposedly independent of each other. Like previous Spectre exploits, this requires two programs working in parallel. One runs the deliberate cache-thrasher loop (while timing it), while the other runs the same loop. The two programs will interfere with each other to keep the µop cache spilling and refilling, which the first program can detect by timing. Alternatively, the second program can run a deliberately benign loop that won’t thrash the µop cache, which the first program can also detect. In this way, the two programs can communicate one bit at a time. Thrashing (slower execution) is a 1, and not thrashing (faster execution) is a 0. 

Tedious, but effective. Even leaking just one bit at a time, that’s still a lot of data at GHz clock speeds. The team also found that they don’t have to manipulate the entire µop cache to make this work. A subset works, too. “We reach our best bandwidth (965.59 Kbps) and error rates (0.22%) when six ways of eight sets are probed, while limiting ourselves to just five samples. We further report an error-corrected bandwidth by encoding our transmitted data with Reed-Solomon encoding that inflates file size by roughly 20%, providing a bandwidth of 785.56 Kbps with no errors.” That’s close to a Mbit/sec of uncompressed data. Yikes. 

As with previous Spectre variations, this one doesn’t tell you how to get the exploit into a target computer, only how to exfiltrate data out once it’s there. 

The research paper concludes with some suggested countermeasures, including flushing the µop cache at frequent intervals. That works, but it also harms performance. It’s also not something most operating systems or hypervisors are designed to do. And it leaves the security up to software, which might itself be compromised. Finally, there’s no “right answer” as to how frequently you should flush the µop cache. More is better, but how much is enough? 

Intel and AMD both say that you don’t have to do any of this – that it’s a solved problem. In similar official statements, both companies said that their existing guidelines for mitigating side-channel attacks will work in this case, too. Those guidelines rely on constant-time coding, which is pretty much what it sounds like. Instead of writing functions to operate as quickly as possible, you write them to run for a fixed amount of time. That’s harder than it sounds, because it’s counterintuitive for most programmers and because it takes practice to do properly. Experts in cryptography – and that’s essentially what this is, cryptography – often rely on constant-time code because it hides internal shortcuts that can reveal secrets. Since Spectre is a timing-based attack, those countermeasures should work here, too. 

I applaud the university team for discovering this weakness, and for putting in the hours it must have taken to characterize and document. The paper comes across as a little hysterical for an academic paper, however, almost as if it were calculated to capture headlines (which it did). Yes, we have a bug in a large number of high-performance processors. No, we’re not all in danger of immediate computer meltdown. 

At this level of complexity, there will always be bugs. (Heck, I’ve even wired flip-flops wrong.) The important thing is to always keep looking for them, always apply reasonable countermeasures, and always assume there’s another one over the horizon.

One thought on “Spectre Bug Rears Its Head Again”

  1. think about all the past actions predicated on the supposition that no such thing existed….always better to know

    as the wise man said, ” the truth hurts, but the lies’ll kill you “

Leave a Reply

featured blogs
Jan 26, 2022
With boards becoming more complex and lightweight at the same time, designing and manufacturing a cost-effective and reliable PCB has assumed greater significance than ever before. Inaccurate or... [[ Click on the title to access the full blog on the Cadence Community site. ...
Jan 26, 2022
PCIe 5.0 designs are currently in massive deployment; learn about the standard and explore PCIe 5.0 applications and the importance of silicon-proven IP. The post The PCI Express 5.0 Superhighway Is Wide, Fast, and Ready for Your Designs appeared first on From Silicon To Sof...
Jan 24, 2022
I just created a handy-dandy one-page Quick-Quick-Start Guide for seniors that covers their most commonly asked questions pertaining to the iPhone SE....

featured video

Synopsys & Samtec: Successful 112G PAM-4 System Interoperability

Sponsored by Synopsys

This Supercomputing Conference demo shows a seamless interoperability between Synopsys' DesignWare 112G Ethernet PHY IP and Samtec's NovaRay IO and cable assembly. The demo shows excellent performance, BER at 1e-08 and total insertion loss of 37dB. Synopsys and Samtec are enabling the industry with a complete 112G PAM-4 system, which is essential for high-performance computing.

Click here for more information about DesignWare Ethernet IP Solutions

featured paper

Clinical-Grade AFE Measures Four Vital Signs for Remote Patient Monitoring Devices

Sponsored by Analog Devices

Simplify the design of wearable remote patient monitoring devices by measuring four vital signs with one triple-system vital signs AFE. This single-chip AFE integrates three measurement systems (optical, ECG and bio-impedance) to obtain four common vital signs: electrocardiogram, heart rate, blood-oxygen saturation, and respiration rate.

Find Out More

featured chalk talk

i.MX RT1170

Sponsored by Mouser Electronics and NXP Semiconductors

Dual Core microcontrollers can bring a lot of benefits to today’s modern embedded designs in order to keep all of our design requirements in balance. In this episode of Chalk Talk, Amelia Dalton chats with Patrick Kennedy from NXP about why newer design requirements for today’s connected embedded systems are making this balancing act even harder than ever before and how the i.MX RT1170 can help solve these problems with its heterogeneous dual cores, MIPI interface, multi-core low power strategy and SRAM PUF technology can make all the difference in your next embedded design.

Click here for More information about NXP Semiconductors i.MX RT1170 crossover microcontrollers