feature article
Subscribe Now

AMD Details Potential Ryzen Attack Vector

Publishes New Side-Channel Vulnerability in Zen 3 Processors

It’s not a bug, it’s a feature, if you publish it in the manual, right? AMD has taken a “white hat” approach to a possible security risk in its newest Zen 3 processors by publishing a white paper that describes the problem. Although there’s no known exploit in the field, AMD appears to be heading off any problems by detailing how, when, and where the problem might occur, and what you can do about it. Kudos to the company for transparency. 

The problem lies in a hidden feature called predictive store forwarding (PSF). It’s a performance enhancement first introduced in Zen 3 chips (i.e., those Ryzen 5900X and 5950X CPUs that are impossible to buy right now) that speculatively feeds data to the processor before it’s actually available. The concept is not entirely novel in the CPU world, but it’s the first time AMD has implemented it in this particular way, and it comes with one wee small caveat. 

First, what is PSF? It’s a way to speed up loads from memory by speculating – guessing, really – what data you’re going to load. Memory loads are among the slowest activities any processor can perform because external DRAM is so #$@% slow compared to your 3-GHz multicore CPU. Waiting on data to come back from memory eats up a lot of time. That’s why we have caches, but even caches are slower than the logic inside the CPU. If only there was some way to know ahead of time what data you were going to load and just work with that…  

Well, there is. Mostly. Chances are, the data you load from memory is the same as whatever you stored there earlier. In a simplistic system with a single processor, that’s how it always works. Memory doesn’t change all by itself (you hope), so any data you store will still be there, safe and sound, when you retrieve it later. The complication starts when you have multiple processors, or intelligent peripherals, or a DMA controller, or shared memory, or even a single multicore processor running multiple threads. Then, there’s no guarantee that the data in DRAM hasn’t been molested a dozen times since you last wrote to it. So, the CPU takes its best guess. 

Working on the theory that the data in DRAM probably hasn’t changed since the last time you wrote to it, AMD’s Zen 3 chips will simply reuse the data from the last store instruction to that location. During write cycles, the processor copies the outgoing data into an internal buffer and tags it with a hash of the destination address. Then, when a load instruction requests data from that same address, the processor uses the data from the buffer instead of waiting around for the DRAM. 

This is much faster than waiting for external memory; it’s even faster than waiting for the data cache (assuming the cache hits). It really pays off when the load instruction comes right after the store instruction, because the processor doesn’t have to wait for the write cycle to complete, then wait some more for the read cycle (from the same address) to complete, then wait while the results are passed into the execution pipeline. 

To be clear, the processor still completes a normal read cycle from external memory. It just doesn’t wait for the results before getting a head start on executing the load and then proceeding with the next several instructions. 

So far, so good. This technique is used in several companies’ processors and is generally known as store-to-load forwarding (STLF). That part’s not new. What’s new is AMD’s tweak to the process that makes it even faster by also bypassing the MMU. 

Standard run-of-the-mill STLF depends on matching the memory address of the store with the memory address of the load. Which, in turn, depends on your MMU. If you’ve got memory translation enabled, the address pointers in your code bear little relation to the physical memory addresses the CPU reads and writes. Until you know the physical address, you can’t match up the loads with the stores. You have to wait for the MMU to perform its address translation. Unless you’re AMD. 

Zen 3 processors skip the address translation lookup and try to divine which loads are paired with which stores simply by observing your code’s behavior. Using a proprietary algorithm (which the company does not reveal), it matches up load/store pairs. It then buffers the data values on stores. If it’s right, and if a store is followed closely by a load from the same address, Zen 3 chips will happily supply the buffered data immediately without waiting for the DRAM, the cache, or the MMU. Result: performance increase. 

What could go wrong? Not much, really. If the pairing algorithm errs and somehow manages to match the wrong load with the wrong store, it will flush the incorrect data from its buffer. That can happen with any hashing algorithm, where two unrelated addresses happen to produce the same hash. No harm done.  

It also flushes the data if the MMU setup has changed, breaking the pairing. This is kind of a corner case, but it’s important to catch anyway. Your code might read and write the same address for a while, but then a change to the MMU’s translation tables might relocate one but not the other.

There’s also the case where indexed arrays can fool the processor into thinking you’re accessing the same addresses when you’re not. All x86 processors can do indexed array addressing in assembly language as part of their native instruction set. This makes them great targets for C compilers (and for hardcore assembly-language programmers), and it makes it deceptively easy for the hardware to second-guess what address you’re really asking for. AMD admits that its pairing algorithm can be fooled if you repeatedly access an array with one index pointer, then switch to a different one. The hardware catches this mistake, too, and flushes the PSF buffer accordingly. In all cases, the right thing happens. 

But that might depend on what you consider “the right thing.” AMD’s Zen 3 chips never supply incorrect data, but they do initiate memory accesses that aren’t strictly necessary – and that may provide the kernel of a malicious exploit. 

The Spectre and Meltdown bugs showed us that the most seemingly trivial of side effects can sometimes be used to compromise a computer system. They both leveraged the subtle side effects of speculative execution to tease out sensitive information. Spectre and Meltdown fetched instructions that were later flushed and unused, but those speculative fetches had an effect on caches. Similarly, PSF reads data that will be flushed and unused, and it also has an observable effect on caches. Thus, PSF’s hidden behavior may have a (barely) detectable effect on a computer system. 

Like Spectre and Meltdown, exploiting the effects of a speculative PSF memory access takes some real effort. And, even when it’s successful, it only exfiltrates data from an already-infected system. It’s not a way in; it’s a way out. 

In a great example of burying the lede, AMD’s whitepaper saves this warning for the very last paragraph: “Predictive Store Forwarding is a new feature in AMD Zen 3 CPUs which may improve application performance but also has security implications.” 

The company warns that this might be exploited in software-only sandboxing, including browsers. Fortunately, PSF can be turned off, unlike the vulnerabilities underlying Spectre and Meltdown. AMD’s chips provide two system-level configuration bits that can disable STLF entirely or just PSF by itself. Either way, the settings are per-thread, so you can disable it for some tasks and leave it enabled for others. 

AMD’s chips automatically flush PSF buffers whenever there’s a change in code privilege, an interrupt, an exception, an intersegment (far) call, a system call, or any of several other circumstances. That makes it pretty hard to exploit across threads, tasks, or code segments. Again, you’d need to work pretty hard to find a way to exploit this potential bug, but that doesn’t mean someone won’t try. Especially now that it’s been publicly documented. Still, better to be upfront about it and highlight the potential weakness now than have to apologize for it later. 

Speculative execution and speculative load/store behavior are a fact of life with modern CPUs. They’re just two of the many tools that CPU designers use to deliver the performance we all crave. The surprise is not that there are unintended consequences to these often obscure and nonintuitive tweaks. The surprise is that there aren’t more of them.

Leave a Reply

featured blogs
Sep 26, 2021
https://youtu.be/Ivi2dTIcm9E Made at my garden gate (camera Carey Guo) Monday: Ten Lessons from Three Generations of Google TPUs Tuesday: At a Digital Crossroads Wednesday: Announcing Helium, Hybrid... [[ Click on the title to access the full blog on the Cadence Community si...
Sep 24, 2021
Wi-Fi, NB-IoT, Bluetooth, LoRaWAN... This webinar will help you to choose the appropriate connectivity protocol for your IoT application....
Sep 23, 2021
The GIRLS GO Engineering scholarship provides opportunities for women in tech and fosters diversity in STEM; see the winners of our 2021 engineering challenge! The post GIRLS GO Engineering! Empowers Our Next-Gen Women in Tech appeared first on From Silicon To Software....
Sep 23, 2021
The Global Environment Facility Small Grants Programme (GEF SGP), implemented by the United Nations Development Programme, is collaborating with the InnovateFPGA contest. Showcase your  skills with Intel Edge-Centric FPGAs and help develop technical solutions that reduce env...

featured video

Enter the InnovateFPGA Design Contest to Solve Real-World Sustainability Problems

Sponsored by Intel

The Global Environment Facility Small Grants Programme (GEF SGP), implemented by the United Nations Development Programme, is collaborating with the InnovateFPGA contest to support 7 funded projects that are looking for technical solutions in biodiversity, sustainable agriculture, and marine conservation. Contestants have access to the Intel® Cyclone® V SoC FPGA in the Cloud Connectivity Kit, Analog Devices plug-in boards, and Microsoft Azure IoT.

Learn more about the contest and enter here by October 31, 2021

featured paper

3 key design decisions for any desktop 3D printer design

Sponsored by Texas Instruments

Learn about three important design considerations to take your 3D print design to the next level.

Click to read more

featured chalk talk

Time Sensitive Networking for Industrial Automation

Sponsored by Mouser Electronics and Intel

In control applications with strict deterministic requirements, such as those found in automotive and industrial domains, Time Sensitive Networking offers a way to send time-critical traffic over a standard Ethernet infrastructure. This enables the convergence of all traffic classes and multiple applications in one network. In this episode of Chalk Talk, Amelia Dalton chats with Josh Levine of Intel and Patrick Loschmidt of TTTech about standards, specifications, and capabilities of time-sensitive networking (TSN).

Click here for more information about Intel Cyclone® V FPGAs