feature article
Subscribe Now

AMD Details Potential Ryzen Attack Vector

Publishes New Side-Channel Vulnerability in Zen 3 Processors

It’s not a bug, it’s a feature, if you publish it in the manual, right? AMD has taken a “white hat” approach to a possible security risk in its newest Zen 3 processors by publishing a white paper that describes the problem. Although there’s no known exploit in the field, AMD appears to be heading off any problems by detailing how, when, and where the problem might occur, and what you can do about it. Kudos to the company for transparency. 

The problem lies in a hidden feature called predictive store forwarding (PSF). It’s a performance enhancement first introduced in Zen 3 chips (i.e., those Ryzen 5900X and 5950X CPUs that are impossible to buy right now) that speculatively feeds data to the processor before it’s actually available. The concept is not entirely novel in the CPU world, but it’s the first time AMD has implemented it in this particular way, and it comes with one wee small caveat. 

First, what is PSF? It’s a way to speed up loads from memory by speculating – guessing, really – what data you’re going to load. Memory loads are among the slowest activities any processor can perform because external DRAM is so #$@% slow compared to your 3-GHz multicore CPU. Waiting on data to come back from memory eats up a lot of time. That’s why we have caches, but even caches are slower than the logic inside the CPU. If only there was some way to know ahead of time what data you were going to load and just work with that…  

Well, there is. Mostly. Chances are, the data you load from memory is the same as whatever you stored there earlier. In a simplistic system with a single processor, that’s how it always works. Memory doesn’t change all by itself (you hope), so any data you store will still be there, safe and sound, when you retrieve it later. The complication starts when you have multiple processors, or intelligent peripherals, or a DMA controller, or shared memory, or even a single multicore processor running multiple threads. Then, there’s no guarantee that the data in DRAM hasn’t been molested a dozen times since you last wrote to it. So, the CPU takes its best guess. 

Working on the theory that the data in DRAM probably hasn’t changed since the last time you wrote to it, AMD’s Zen 3 chips will simply reuse the data from the last store instruction to that location. During write cycles, the processor copies the outgoing data into an internal buffer and tags it with a hash of the destination address. Then, when a load instruction requests data from that same address, the processor uses the data from the buffer instead of waiting around for the DRAM. 

This is much faster than waiting for external memory; it’s even faster than waiting for the data cache (assuming the cache hits). It really pays off when the load instruction comes right after the store instruction, because the processor doesn’t have to wait for the write cycle to complete, then wait some more for the read cycle (from the same address) to complete, then wait while the results are passed into the execution pipeline. 

To be clear, the processor still completes a normal read cycle from external memory. It just doesn’t wait for the results before getting a head start on executing the load and then proceeding with the next several instructions. 

So far, so good. This technique is used in several companies’ processors and is generally known as store-to-load forwarding (STLF). That part’s not new. What’s new is AMD’s tweak to the process that makes it even faster by also bypassing the MMU. 

Standard run-of-the-mill STLF depends on matching the memory address of the store with the memory address of the load. Which, in turn, depends on your MMU. If you’ve got memory translation enabled, the address pointers in your code bear little relation to the physical memory addresses the CPU reads and writes. Until you know the physical address, you can’t match up the loads with the stores. You have to wait for the MMU to perform its address translation. Unless you’re AMD. 

Zen 3 processors skip the address translation lookup and try to divine which loads are paired with which stores simply by observing your code’s behavior. Using a proprietary algorithm (which the company does not reveal), it matches up load/store pairs. It then buffers the data values on stores. If it’s right, and if a store is followed closely by a load from the same address, Zen 3 chips will happily supply the buffered data immediately without waiting for the DRAM, the cache, or the MMU. Result: performance increase. 

What could go wrong? Not much, really. If the pairing algorithm errs and somehow manages to match the wrong load with the wrong store, it will flush the incorrect data from its buffer. That can happen with any hashing algorithm, where two unrelated addresses happen to produce the same hash. No harm done.  

It also flushes the data if the MMU setup has changed, breaking the pairing. This is kind of a corner case, but it’s important to catch anyway. Your code might read and write the same address for a while, but then a change to the MMU’s translation tables might relocate one but not the other.

There’s also the case where indexed arrays can fool the processor into thinking you’re accessing the same addresses when you’re not. All x86 processors can do indexed array addressing in assembly language as part of their native instruction set. This makes them great targets for C compilers (and for hardcore assembly-language programmers), and it makes it deceptively easy for the hardware to second-guess what address you’re really asking for. AMD admits that its pairing algorithm can be fooled if you repeatedly access an array with one index pointer, then switch to a different one. The hardware catches this mistake, too, and flushes the PSF buffer accordingly. In all cases, the right thing happens. 

But that might depend on what you consider “the right thing.” AMD’s Zen 3 chips never supply incorrect data, but they do initiate memory accesses that aren’t strictly necessary – and that may provide the kernel of a malicious exploit. 

The Spectre and Meltdown bugs showed us that the most seemingly trivial of side effects can sometimes be used to compromise a computer system. They both leveraged the subtle side effects of speculative execution to tease out sensitive information. Spectre and Meltdown fetched instructions that were later flushed and unused, but those speculative fetches had an effect on caches. Similarly, PSF reads data that will be flushed and unused, and it also has an observable effect on caches. Thus, PSF’s hidden behavior may have a (barely) detectable effect on a computer system. 

Like Spectre and Meltdown, exploiting the effects of a speculative PSF memory access takes some real effort. And, even when it’s successful, it only exfiltrates data from an already-infected system. It’s not a way in; it’s a way out. 

In a great example of burying the lede, AMD’s whitepaper saves this warning for the very last paragraph: “Predictive Store Forwarding is a new feature in AMD Zen 3 CPUs which may improve application performance but also has security implications.” 

The company warns that this might be exploited in software-only sandboxing, including browsers. Fortunately, PSF can be turned off, unlike the vulnerabilities underlying Spectre and Meltdown. AMD’s chips provide two system-level configuration bits that can disable STLF entirely or just PSF by itself. Either way, the settings are per-thread, so you can disable it for some tasks and leave it enabled for others. 

AMD’s chips automatically flush PSF buffers whenever there’s a change in code privilege, an interrupt, an exception, an intersegment (far) call, a system call, or any of several other circumstances. That makes it pretty hard to exploit across threads, tasks, or code segments. Again, you’d need to work pretty hard to find a way to exploit this potential bug, but that doesn’t mean someone won’t try. Especially now that it’s been publicly documented. Still, better to be upfront about it and highlight the potential weakness now than have to apologize for it later. 

Speculative execution and speculative load/store behavior are a fact of life with modern CPUs. They’re just two of the many tools that CPU designers use to deliver the performance we all crave. The surprise is not that there are unintended consequences to these often obscure and nonintuitive tweaks. The surprise is that there aren’t more of them.

Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Universal Verification Methodology Coverage for Bluespec RISC-V Cores

Sponsored by Synopsys

This whitepaper explains the basics of UVM functional coverage for RISC-V cores using the Google RISCV-DV open-source project, Synopsys verification solutions, and a RISC-V processor core from Bluespec.

Click to read more

featured chalk talk

Enabling IoT with DECT NR+, the Non-Cellular 5G Standard
In the ever-expanding IoT market, there is a growing need for private, low cost networks. In this episode of Chalk Talk, Amelia Dalton and Heidi Sollie from Nordic Semiconductor explore the details of DECT NR+, the world’s first non-cellular 5G technology standard. They investigate how this self-healing, decentralized, autonomous mesh network can help solve a variety of IoT connectivity issues and how Nordic is helping designers take advantage of DECT NR+ with their nRF91 System-in-Package family.
Aug 17, 2023
14,180 views