feature article
Subscribe Now

Clever Hack Finds Mystery CPU Instructions

Enterprising Programmer Uses x86 Microcode to Reveal Itself

In the 1966 movie Fantastic Voyage, a team of doctors and scientists gets miniaturized and injected into the bloodstream of a human patient. They and their yellow submarine navigate past heart valves, battle corpuscles, and swim in tear ducts. It provides an inside look into biological workings most of us never see. 

An enterprising Hungarian engineer, Can Bölük at Verilave, has done something similar, probing the obscure nervous system of Intel’s fabulously complex x86 microprocessors. He’s probed for unused and undocumented opcodes in the chips’ instruction sets. And he’s uncovered quite a bit. His code is available on Github

If you’ve ever programmed an x86 chip in assembly language, your first thought might be, “I didn’t know there were any unused opcodes.” After all, the architecture has a spectacularly complex and richly well-appointed instruction set that’s been growing steadily since the 1970s. Surely every instruction has been done by now? The opcode map must be full to overflowing? 

Nope. There are still plenty of holes in the x86 opcode map, and not all of them are actually empty. Some are just… undocumented. Some instructions appear to be deeply buried maintenance functions while others look like Easter eggs, forgotten bugs, or partially implemented instructions that never quite saw the light of day. A few seem to provide remarkably godlike powers that appear to circumvent on-chip security or even to rewrite the chip’s internal microcode

Bölük didn’t find all of these omissions himself, but he did automate the process of uncovering them, and his approach is as remarkable as it is nonintuitive. It’s a cleverly crafted program that relies on an unlikely ally: speculative execution. 

For starters, finding unused opcodes is a lot harder than it might sound. You can’t just stuff memory with every value from 0x00 through 0xFF, execute it, and watch what happens. That’s especially true of x86 processors, which have variable-length instructions and a convoluted system of code prefix bytes, suffix bytes, modes, and internal register designations. Instructions that work with some registers don’t work with others. Some memory modes are supported by some instructions but not others, and so on. It’s unabashedly nonorthogonal. 

The same instruction (that is, the same binary encoding) can be interpreted and executed different ways depending on whether the chip is operating in 16-bit mode, 32-bit mode, or 64-bit mode. It can depend on privilege level (CPL0 through CPL3), the number of CPU cores, or the MMU configuration. A given two-byte sequence might not work, but a three-byte variation will. An instruction might work only when it’s preceded by a certain instruction or followed by another instruction. CISC? Yeah, you could say it’s a complex instruction set. 

On top of all that, an instruction might be legal but not observable. How do you know a random instruction actually did anything? NOP is a legitimate instruction but has no effect (that’s the point). The CLI instruction disables interrupts, but how do you know? 

Conversely, some instructions will have too much of an effect and ruin your exploration. The processor might crash, or counters get reset, or memory wiped out. What if it halts and catches fire? Randomly plugging in opcodes to see what happens is like defusing a bomb by cutting random wires. 

Instead, Bölük took a methodical approach. He doesn’t ask the chip to actually execute any suspected hidden instructions. He just wants it to fetch them. Then, he can tell what’s a real instruction and what’s not by observing some subtle side effects. 

His trick relies on speculative execution after a conditional branch. As we know, all modern processors try to eke out extra performance by executing a handful of instructions immediately after a branch instruction. Once the branch is resolved, if the chip guessed correctly, then that’s all free work. If the chip guessed incorrectly, all the speculative work is discarded, and any results are canceled as if they’d never happened. 

That means that if you can convince the processor to execute a mystery instruction speculatively, there should be no side effects. If it’s an invalid instruction, the chip won’t generate the illegal instruction fault. But if it is a valid instruction, it won’t affect the chip’s behavior or do random and unexpected things. Either way, you’ve tested an unknown instruction without causing problems. It’s the perfect dry run. 

Enter Schrödinger’s cat. If there are no observable effects one way or the other, how do you tell a valid but unknown instruction from an invalid one? That’s where some little-known performance counters inside the chip come into play. 

Intel’s x86 chips don’t actually execute the x86 instruction set we all know. Instead, they convert the familiar assembly-level instructions (MOV, PUSH, XCHG, STOSB, FXTRACT, et al.) into even more rudimentary micro-instructions. Yes, Virginia, inside every CISC processor is a RISC processor trying to get out. It’s been this way for more than 20 years. AMD’s chips work the same way, although their internal micro-instructions are different. 

The conversion from complex assembly instructions to simpler micro-instructions is controlled by an internal microcode engine. In fact, there are two of them, one for relatively simple instructions and one for the really complex operations. This is something that even die-hard assembly coders never see and don’t care about. But, if you dig around enough, you’ll find that there are a couple of internal CPU status registers that report on the activity of these sequencers. And that can tell you a bit about what the CPU is trying to execute – even if it later flushes and disregards the results. Voila! A peephole into the mysterious realm of undocumented x86 opcodes. 

Bölük details how his program skips over known x86 instructions and known interactions among those instructions. He also expends a lot of effort removing measurement artifacts. After all that, he discovered more than 50 previously undefined opcodes. He also discovered some unanticipated side effects with some instructions, such as the fact that loading CPU control register CR2 doesn’t serialize as expected (that is, it doesn’t pause speculative execution).  

Bölük’s method is a riff on the Spectre and Meltdown bugs, as well as AMD’s recently disclosed PSF vulnerability, but for a good cause. They all leverage speculative execution to force subtle but observable side effects within the processor. More evidence that every action has an equal and opposite reaction.

3 thoughts on “Clever Hack Finds Mystery CPU Instructions”

Leave a Reply

featured blogs
Jul 29, 2021
Circuit checks enable you to analyze typical design problems, such as high impedance nodes, leakage paths between power supplies, timing errors, power issues, connectivity problems, or extreme rise... [[ Click on the title to access the full blog on the Cadence Community sit...
Jul 29, 2021
Learn why SoC emulation is the next frontier for power system optimization, helping chip designers shift power verification left in the SoC design flow. The post Why Wait Days for Results? The Next Frontier for Power Verification appeared first on From Silicon To Software....
Jul 28, 2021
Here's a sticky problem. What if the entire Earth was instantaneously replaced with an equal volume of closely packed, but uncompressed blueberries?...
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

DesignWare Controller and PHY IP for PCIe 6.0

Sponsored by Synopsys

See a demo of Synopsys’ complete IP solution for PCIe 6.0 technology showing the controller operating at 64GT/s in FLIT mode and the PAM-4 PHY in 5-nm process achieving two orders of magnitude better BER with 32dB PCIe channel.

Click here for more information about DesignWare IP for PCI Express (PCIe) 6.0

featured paper

Hyperconnectivity and You: A Roadmap for the Consumer Experience

Sponsored by Cadence Design Systems

Will people’s views about hyperconnectivity and hyperscale computing affect requirements for your next system or IC design? Download the latest Cadence report for how consumers view hyperscale computing’s impact on cars, mobile devices, and health.

Click to read more

featured chalk talk

IsoMOV

Sponsored by Mouser Electronics and Bourns

Today, your circuit protection device needs to be versatile, handling a wide range of conditions with long-life low capacitance, low leakage, and state-of-the-art energy handling density. In this episode of Chalk Talk, Amelia Dalton chats with Paul Smith from Bourns about IsoMOV - a new integrated circuit protection that brings together the most important circuit protection capabilities in one efficient package.

Click here for more information about Bourns IsoMOV™ Series Hybrid Protection Component