Boo! A scary new variation of the Spectre CPU bug has surfaced, and it may be resistant to the fixes and countermeasures already deployed. Or maybe not.
A band of CS/EE students has published a paper, provocatively dubbed “I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches,” claiming to reveal another new way to siphon sensitive information out of x86 processors. It looks like yet another variation of the well-known Spectre bug, but in addition to detailing the discovery itself, the paper also reports that this version isn’t fixable.
There are now so many variations of Spectre that it’s become its own brand. Several fixes and countermeasures have been deployed since it was first discovered three years ago, but this new strain is fix-resistant – at least, according to its discoverers.
Intel disagrees, and says that its currently published guidelines will squash it very nicely, thank you very much. AMD agrees. Both companies are saying, in essence, go back to your homes, programmers, there’s nothing to see here.
First, some background. As with all the previous versions of Spectre, this one depends on extremely subtle side effects in the way that microprocessors fetch and execute software instructions. Note that Spectre is not an Intel bug or an AMD bug. It’s not even an x86 bug. It’s a complexity bug. Spectre has affected nearly all modern high-end microprocessors, including those designed by ARM. Only this latest version is x86-specific.
Such chips all perform speculative execution, meaning they sometimes guess what they’re supposed to do next. When they guess right – and most of the time, they do – they save a bunch of time. When they occasionally guess wrong, they discard the incorrect results and start over with the right instructions. All of this is invisible to software, even to those of us who dabble in assembly-language programming.
In the initial versions of Spectre, attackers leveraged the effect that speculative execution had on the chip’s data caches. This new one instead exploits the chip’s micro-operations cache, an even more deeply hidden feature that helps speed up instruction decoding. Today’s x86 processors don’t actually execute x86 instructions natively. Instead, each x86 instruction is decomposed into simpler micro-operations (µops) that look like generic RISC processor instructions. This translation is deliberately hidden from us, but some details of its operation can be inferred.
Some x86 instructions are pretty simple, while some are insanely complex. That means some x86 instructions translate into just one µop, some translate into a few µops, and some translate into a whole bunch of µops. Not surprisingly, cracking apart the complex instructions takes longer than translating the simple ones. To speed things up, those translations are kept in a µop cache, which can hold a few hundred to a few thousand µops, depending on the processor. Chipmakers tend to be pretty secretive about the whole translation process and the details of the µop cache.
But we do know that the µop cache is probabilistic: sometimes it holds the information you want, and sometimes it doesn’t. You can’t ever be sure. If the correct x86-to-µop translation data is in the cache, great! If not, no big deal, the chip will look it up. At most, you’ve lost a few clock cycles. But it is precisely this timing difference that the researchers’ new variation exploits.
We also know that, like all caches, the µop cache is organized into cache lines, sets, and ways, and that it uses an associative mapping mechanism. All of this means that you could – with a lot of effort – organize instructions in memory in a way that perfectly matches the µop cache’s internal organization, and all of that data would be cached. You could also do the opposite and create a perfect cache-thrasher. After a lot of experimentation, the university researchers created code streams that are ideal examples of both.
Using fine-grained performance monitors, they were able to measure the time difference between code that was in the µop cache and code that wasn’t. “This allows us to obtain a clearly distinguishable binary signal (i.e., hit vs. miss) with a mean timing difference of 218.4 cycles and a standard deviation of 27.8 cycles, allowing us to reliably transmit a bit (i.e., one-bit vs. zero-bit) over the micro-op cache.”
That’s swell, but how is it useful? And how does it constitute a security threat? You can use those timing differences to leak information from one program to another, even when the two are supposedly independent of each other. Like previous Spectre exploits, this requires two programs working in parallel. One runs the deliberate cache-thrasher loop (while timing it), while the other runs the same loop. The two programs will interfere with each other to keep the µop cache spilling and refilling, which the first program can detect by timing. Alternatively, the second program can run a deliberately benign loop that won’t thrash the µop cache, which the first program can also detect. In this way, the two programs can communicate one bit at a time. Thrashing (slower execution) is a 1, and not thrashing (faster execution) is a 0.
Tedious, but effective. Even leaking just one bit at a time, that’s still a lot of data at GHz clock speeds. The team also found that they don’t have to manipulate the entire µop cache to make this work. A subset works, too. “We reach our best bandwidth (965.59 Kbps) and error rates (0.22%) when six ways of eight sets are probed, while limiting ourselves to just five samples. We further report an error-corrected bandwidth by encoding our transmitted data with Reed-Solomon encoding that inflates file size by roughly 20%, providing a bandwidth of 785.56 Kbps with no errors.” That’s close to a Mbit/sec of uncompressed data. Yikes.
As with previous Spectre variations, this one doesn’t tell you how to get the exploit into a target computer, only how to exfiltrate data out once it’s there.
The research paper concludes with some suggested countermeasures, including flushing the µop cache at frequent intervals. That works, but it also harms performance. It’s also not something most operating systems or hypervisors are designed to do. And it leaves the security up to software, which might itself be compromised. Finally, there’s no “right answer” as to how frequently you should flush the µop cache. More is better, but how much is enough?
Intel and AMD both say that you don’t have to do any of this – that it’s a solved problem. In similar official statements, both companies said that their existing guidelines for mitigating side-channel attacks will work in this case, too. Those guidelines rely on constant-time coding, which is pretty much what it sounds like. Instead of writing functions to operate as quickly as possible, you write them to run for a fixed amount of time. That’s harder than it sounds, because it’s counterintuitive for most programmers and because it takes practice to do properly. Experts in cryptography – and that’s essentially what this is, cryptography – often rely on constant-time code because it hides internal shortcuts that can reveal secrets. Since Spectre is a timing-based attack, those countermeasures should work here, too.
I applaud the university team for discovering this weakness, and for putting in the hours it must have taken to characterize and document. The paper comes across as a little hysterical for an academic paper, however, almost as if it were calculated to capture headlines (which it did). Yes, we have a bug in a large number of high-performance processors. No, we’re not all in danger of immediate computer meltdown.
At this level of complexity, there will always be bugs. (Heck, I’ve even wired flip-flops wrong.) The important thing is to always keep looking for them, always apply reasonable countermeasures, and always assume there’s another one over the horizon.