feature article
Subscribe Now

AMD Details Potential Ryzen Attack Vector

Publishes New Side-Channel Vulnerability in Zen 3 Processors

It’s not a bug, it’s a feature, if you publish it in the manual, right? AMD has taken a “white hat” approach to a possible security risk in its newest Zen 3 processors by publishing a white paper that describes the problem. Although there’s no known exploit in the field, AMD appears to be heading off any problems by detailing how, when, and where the problem might occur, and what you can do about it. Kudos to the company for transparency. 

The problem lies in a hidden feature called predictive store forwarding (PSF). It’s a performance enhancement first introduced in Zen 3 chips (i.e., those Ryzen 5900X and 5950X CPUs that are impossible to buy right now) that speculatively feeds data to the processor before it’s actually available. The concept is not entirely novel in the CPU world, but it’s the first time AMD has implemented it in this particular way, and it comes with one wee small caveat. 

First, what is PSF? It’s a way to speed up loads from memory by speculating – guessing, really – what data you’re going to load. Memory loads are among the slowest activities any processor can perform because external DRAM is so #$@% slow compared to your 3-GHz multicore CPU. Waiting on data to come back from memory eats up a lot of time. That’s why we have caches, but even caches are slower than the logic inside the CPU. If only there was some way to know ahead of time what data you were going to load and just work with that…  

Well, there is. Mostly. Chances are, the data you load from memory is the same as whatever you stored there earlier. In a simplistic system with a single processor, that’s how it always works. Memory doesn’t change all by itself (you hope), so any data you store will still be there, safe and sound, when you retrieve it later. The complication starts when you have multiple processors, or intelligent peripherals, or a DMA controller, or shared memory, or even a single multicore processor running multiple threads. Then, there’s no guarantee that the data in DRAM hasn’t been molested a dozen times since you last wrote to it. So, the CPU takes its best guess. 

Working on the theory that the data in DRAM probably hasn’t changed since the last time you wrote to it, AMD’s Zen 3 chips will simply reuse the data from the last store instruction to that location. During write cycles, the processor copies the outgoing data into an internal buffer and tags it with a hash of the destination address. Then, when a load instruction requests data from that same address, the processor uses the data from the buffer instead of waiting around for the DRAM. 

This is much faster than waiting for external memory; it’s even faster than waiting for the data cache (assuming the cache hits). It really pays off when the load instruction comes right after the store instruction, because the processor doesn’t have to wait for the write cycle to complete, then wait some more for the read cycle (from the same address) to complete, then wait while the results are passed into the execution pipeline. 

To be clear, the processor still completes a normal read cycle from external memory. It just doesn’t wait for the results before getting a head start on executing the load and then proceeding with the next several instructions. 

So far, so good. This technique is used in several companies’ processors and is generally known as store-to-load forwarding (STLF). That part’s not new. What’s new is AMD’s tweak to the process that makes it even faster by also bypassing the MMU. 

Standard run-of-the-mill STLF depends on matching the memory address of the store with the memory address of the load. Which, in turn, depends on your MMU. If you’ve got memory translation enabled, the address pointers in your code bear little relation to the physical memory addresses the CPU reads and writes. Until you know the physical address, you can’t match up the loads with the stores. You have to wait for the MMU to perform its address translation. Unless you’re AMD. 

Zen 3 processors skip the address translation lookup and try to divine which loads are paired with which stores simply by observing your code’s behavior. Using a proprietary algorithm (which the company does not reveal), it matches up load/store pairs. It then buffers the data values on stores. If it’s right, and if a store is followed closely by a load from the same address, Zen 3 chips will happily supply the buffered data immediately without waiting for the DRAM, the cache, or the MMU. Result: performance increase. 

What could go wrong? Not much, really. If the pairing algorithm errs and somehow manages to match the wrong load with the wrong store, it will flush the incorrect data from its buffer. That can happen with any hashing algorithm, where two unrelated addresses happen to produce the same hash. No harm done.  

It also flushes the data if the MMU setup has changed, breaking the pairing. This is kind of a corner case, but it’s important to catch anyway. Your code might read and write the same address for a while, but then a change to the MMU’s translation tables might relocate one but not the other.

There’s also the case where indexed arrays can fool the processor into thinking you’re accessing the same addresses when you’re not. All x86 processors can do indexed array addressing in assembly language as part of their native instruction set. This makes them great targets for C compilers (and for hardcore assembly-language programmers), and it makes it deceptively easy for the hardware to second-guess what address you’re really asking for. AMD admits that its pairing algorithm can be fooled if you repeatedly access an array with one index pointer, then switch to a different one. The hardware catches this mistake, too, and flushes the PSF buffer accordingly. In all cases, the right thing happens. 

But that might depend on what you consider “the right thing.” AMD’s Zen 3 chips never supply incorrect data, but they do initiate memory accesses that aren’t strictly necessary – and that may provide the kernel of a malicious exploit. 

The Spectre and Meltdown bugs showed us that the most seemingly trivial of side effects can sometimes be used to compromise a computer system. They both leveraged the subtle side effects of speculative execution to tease out sensitive information. Spectre and Meltdown fetched instructions that were later flushed and unused, but those speculative fetches had an effect on caches. Similarly, PSF reads data that will be flushed and unused, and it also has an observable effect on caches. Thus, PSF’s hidden behavior may have a (barely) detectable effect on a computer system. 

Like Spectre and Meltdown, exploiting the effects of a speculative PSF memory access takes some real effort. And, even when it’s successful, it only exfiltrates data from an already-infected system. It’s not a way in; it’s a way out. 

In a great example of burying the lede, AMD’s whitepaper saves this warning for the very last paragraph: “Predictive Store Forwarding is a new feature in AMD Zen 3 CPUs which may improve application performance but also has security implications.” 

The company warns that this might be exploited in software-only sandboxing, including browsers. Fortunately, PSF can be turned off, unlike the vulnerabilities underlying Spectre and Meltdown. AMD’s chips provide two system-level configuration bits that can disable STLF entirely or just PSF by itself. Either way, the settings are per-thread, so you can disable it for some tasks and leave it enabled for others. 

AMD’s chips automatically flush PSF buffers whenever there’s a change in code privilege, an interrupt, an exception, an intersegment (far) call, a system call, or any of several other circumstances. That makes it pretty hard to exploit across threads, tasks, or code segments. Again, you’d need to work pretty hard to find a way to exploit this potential bug, but that doesn’t mean someone won’t try. Especially now that it’s been publicly documented. Still, better to be upfront about it and highlight the potential weakness now than have to apologize for it later. 

Speculative execution and speculative load/store behavior are a fact of life with modern CPUs. They’re just two of the many tools that CPU designers use to deliver the performance we all crave. The surprise is not that there are unintended consequences to these often obscure and nonintuitive tweaks. The surprise is that there aren’t more of them.

Leave a Reply

featured blogs
Mar 28, 2024
The difference between Olympic glory and missing out on the podium is often measured in mere fractions of a second, highlighting the pivotal role of timing in sports. But what's the chronometric secret to those photo finishes and record-breaking feats? In this comprehens...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
Sponsored by Mouser Electronics and Amphenol
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023
34,810 views