Security Flaw Afflicts Intel x86 Boot ROMs

Another year, another Intel bug.

What makes this bug interesting is that it’s so crafty. It’s fun to see what loopholes hardware and software can discover all by themselves. If you’re a homeowner, you know that water is amazingly devious. It can intrude anywhere, including moving uphill, through tiny holes, and along porous surfaces. Stopping leaks – or finding bugs – feels like a never-ending chore.

In this case, an Intel bug fix introduced a new bug. Ironic? Sure, but, to be fair, Intel’s x86 processors are insanely complex while also being wildly popular. That’s a combination guaranteed to uncover flaws.

The bug in question is officially being called CVE-2020-8705 and it involves corrupted boot ROMs. We’ve all become familiar with the concept of a root of trust as a way to ensure that everything in your software stack, from the boot ROM all the way up to your drivers and applications, are all legitimate. The process usually involves each layer of software vouching for the security of the next layer. Your hardware checks that the firmware is okay before executing it, then the firmware checks the operating system, which then checks its drivers, and so on.

We’re also all familiar with the concept of race conditions. Unwanted race conditions usually happen in hardware, but there can be software races, too. Maybe a software semaphore is set by one process at the very moment that another process is checking it, so the two wind up disagreeing on the status.

CVE-2020-8705 combines the root of trust with race conditions to uncover a situation where evildoers can corrupt your boot ROM and unlock secrets you thought were safely hidden away. The good news is, exploiting the bug requires physical access to the hardware. It can’t be attacked remotely. The bad news is, it affects nearly all Intel x86-based machines, regardless of operating system. It’s not specific to Windows or Linux or MacOS. It’s baked into the silicon, and it’s hard – perhaps even impossible – to fix.

Here’s what happens. All x86 processors have a sleep mode. In fact, they have several. A commonly used sleep mode is S3, and it’s usually activated when you close the lid on your laptop or when a software timer triggers sleep after a period of inactivity. In S3 sleep, the processor is completely dead but the system DRAM stays alive. All CPU state is lost, but all memory contents are preserved. The idea is to save power while also enabling a quick restart when you open your laptop, wiggle your mouse, or tap your screen. Rather than doing a complete cold reboot, the machine simply wakes the processor and resumes where it left off.

But because the processor was shut down, it needs to boot up and be reinitialized. The process is about 90% the same as for a cold boot, but without disturbing the contents of RAM. Most, but not all, of the code path is the same. Resuming from sleep is faster – that’s the whole point – because it skips a few steps. In particular, a cold boot will verify that the boot ROM is valid, but a wake from sleep won’t, on the assumption that the boot ROM hasn’t changed since the computer went to sleep.

But what if it has changed?

Security researcher Trammell Hudson discovered that UEFI boot ROMs don’t perform a root-of-trust check when they awaken from sleep mode. After all, the computer was never turned off, so how could the ROM have changed? Even though it’s possible (common, in fact) to update the flash BIOS while the computer is powered-up, the update process checks the validity of the new firmware and re-establishes the root of trust. But if there hasn’t been a reflash, and there hasn’t been a cold shutdown, the firmware must still be valid, right?

Not if you have a chip clip and a malicious mind. Trammell showed that it’s possible to reflash the boot ROM while the system is in sleep mode. When the machine reawakens, it bypasses all the usual security checks and runs whatever new firmware you’ve loaded. And, since system RAM is undisturbed, the first thing your malicious code might want to do is to scan memory for interesting information, such as decryption keys. Windows and Linux both store their disk-encryption keys in RAM, which means they’re preserved during sleep mode and therefore accessible to malicious firmware.

Hacking a machine in this way does require physical access to the boot ROM, but that’s not difficult with most laptops, and it’s even easier with desktop or server enclosures. If you can reach the boot ROM’s SPI interface pins, you’re in. Microsoft’s Surface Pro tablets and other tightly packed consumer systems might be immune simply because they’re impossible to open. Or, at least, they can’t be opened and then closed back up again, so don’t plan on being sneaky.

Trammell describes this as “a classic time-of-check / time-of-use (TOCTOU) error” It’s a slow-motion race condition between when the boot ROM is verified (at cold boot) and when that supposedly secure code is executed (while exiting sleep mode). There’s no way to guarantee that the code you’re executing is the same code you just checked. There’s always a finite time delay between the two events.

Sadly, there would have been a trivially easy fix for this, but it seems to have fallen victim to corporate politics and marketing. Systems that use Intel’s platform controller hub (PCH) chip have one-time programmable fuses specifically intended to enable/disable speedy resume-from-sleep restarts. Leaving the fuses intact forces the processor to do a complete reboot, including all security checks, when it resumes from S3 sleep mode. Blowing the fuses enables the quicker restart but bypasses the security checks. Guess which mode nearly every hardware OEM uses?

In part that’s because Microsoft encourages it. The company designed Windows 10 to restart quickly, in part to better compete with new mobile operating systems like iOS and Android that don’t suffer from Windows’s lengthy and convoluted boot-up procedure. Windows OEMs are strongly encouraged to keep restart times under 2.0 seconds, and that pretty much requires warm-start shortcuts. Which means blowing the PCH fuses, which means the option is irreversible.

What’s the fix? If you’re not using Intel’s PCH support chip, you should be able to enable full security checks when exiting S3 sleep mode. That way, your boot ROM is checked every time it’s used, not just on cold starts. Warm starts will take longer, but they’ll be more secure.

If you are using Intel’s PCH, the next best option is to deploy Intel’s firmware workaround. The company isn’t saying exactly what the newer version does, except to suggest that it improves security. Trammell theorizes that it ignores the one-time programmable fuses in the PCH and uses its own switch instead.

Ominously, he also points out that “This workaround also points to a weakness in the OTP fuses.” So, the fix for the fix may need a fix. The root of trust starts a very long chain, indeed.