feature article
Subscribe Now

Clever Hack Finds Mystery CPU Instructions

Enterprising Programmer Uses x86 Microcode to Reveal Itself

In the 1966 movie Fantastic Voyage, a team of doctors and scientists gets miniaturized and injected into the bloodstream of a human patient. They and their yellow submarine navigate past heart valves, battle corpuscles, and swim in tear ducts. It provides an inside look into biological workings most of us never see. 

An enterprising Hungarian engineer, Can Bölük at Verilave, has done something similar, probing the obscure nervous system of Intel’s fabulously complex x86 microprocessors. He’s probed for unused and undocumented opcodes in the chips’ instruction sets. And he’s uncovered quite a bit. His code is available on Github

If you’ve ever programmed an x86 chip in assembly language, your first thought might be, “I didn’t know there were any unused opcodes.” After all, the architecture has a spectacularly complex and richly well-appointed instruction set that’s been growing steadily since the 1970s. Surely every instruction has been done by now? The opcode map must be full to overflowing? 

Nope. There are still plenty of holes in the x86 opcode map, and not all of them are actually empty. Some are just… undocumented. Some instructions appear to be deeply buried maintenance functions while others look like Easter eggs, forgotten bugs, or partially implemented instructions that never quite saw the light of day. A few seem to provide remarkably godlike powers that appear to circumvent on-chip security or even to rewrite the chip’s internal microcode

Bölük didn’t find all of these omissions himself, but he did automate the process of uncovering them, and his approach is as remarkable as it is nonintuitive. It’s a cleverly crafted program that relies on an unlikely ally: speculative execution. 

For starters, finding unused opcodes is a lot harder than it might sound. You can’t just stuff memory with every value from 0x00 through 0xFF, execute it, and watch what happens. That’s especially true of x86 processors, which have variable-length instructions and a convoluted system of code prefix bytes, suffix bytes, modes, and internal register designations. Instructions that work with some registers don’t work with others. Some memory modes are supported by some instructions but not others, and so on. It’s unabashedly nonorthogonal. 

The same instruction (that is, the same binary encoding) can be interpreted and executed different ways depending on whether the chip is operating in 16-bit mode, 32-bit mode, or 64-bit mode. It can depend on privilege level (CPL0 through CPL3), the number of CPU cores, or the MMU configuration. A given two-byte sequence might not work, but a three-byte variation will. An instruction might work only when it’s preceded by a certain instruction or followed by another instruction. CISC? Yeah, you could say it’s a complex instruction set. 

On top of all that, an instruction might be legal but not observable. How do you know a random instruction actually did anything? NOP is a legitimate instruction but has no effect (that’s the point). The CLI instruction disables interrupts, but how do you know? 

Conversely, some instructions will have too much of an effect and ruin your exploration. The processor might crash, or counters get reset, or memory wiped out. What if it halts and catches fire? Randomly plugging in opcodes to see what happens is like defusing a bomb by cutting random wires. 

Instead, Bölük took a methodical approach. He doesn’t ask the chip to actually execute any suspected hidden instructions. He just wants it to fetch them. Then, he can tell what’s a real instruction and what’s not by observing some subtle side effects. 

His trick relies on speculative execution after a conditional branch. As we know, all modern processors try to eke out extra performance by executing a handful of instructions immediately after a branch instruction. Once the branch is resolved, if the chip guessed correctly, then that’s all free work. If the chip guessed incorrectly, all the speculative work is discarded, and any results are canceled as if they’d never happened. 

That means that if you can convince the processor to execute a mystery instruction speculatively, there should be no side effects. If it’s an invalid instruction, the chip won’t generate the illegal instruction fault. But if it is a valid instruction, it won’t affect the chip’s behavior or do random and unexpected things. Either way, you’ve tested an unknown instruction without causing problems. It’s the perfect dry run. 

Enter Schrödinger’s cat. If there are no observable effects one way or the other, how do you tell a valid but unknown instruction from an invalid one? That’s where some little-known performance counters inside the chip come into play. 

Intel’s x86 chips don’t actually execute the x86 instruction set we all know. Instead, they convert the familiar assembly-level instructions (MOV, PUSH, XCHG, STOSB, FXTRACT, et al.) into even more rudimentary micro-instructions. Yes, Virginia, inside every CISC processor is a RISC processor trying to get out. It’s been this way for more than 20 years. AMD’s chips work the same way, although their internal micro-instructions are different. 

The conversion from complex assembly instructions to simpler micro-instructions is controlled by an internal microcode engine. In fact, there are two of them, one for relatively simple instructions and one for the really complex operations. This is something that even die-hard assembly coders never see and don’t care about. But, if you dig around enough, you’ll find that there are a couple of internal CPU status registers that report on the activity of these sequencers. And that can tell you a bit about what the CPU is trying to execute – even if it later flushes and disregards the results. Voila! A peephole into the mysterious realm of undocumented x86 opcodes. 

Bölük details how his program skips over known x86 instructions and known interactions among those instructions. He also expends a lot of effort removing measurement artifacts. After all that, he discovered more than 50 previously undefined opcodes. He also discovered some unanticipated side effects with some instructions, such as the fact that loading CPU control register CR2 doesn’t serialize as expected (that is, it doesn’t pause speculative execution).  

Bölük’s method is a riff on the Spectre and Meltdown bugs, as well as AMD’s recently disclosed PSF vulnerability, but for a good cause. They all leverage speculative execution to force subtle but observable side effects within the processor. More evidence that every action has an equal and opposite reaction.

3 thoughts on “Clever Hack Finds Mystery CPU Instructions”

Leave a Reply

featured blogs
Oct 22, 2021
Voltus TM IC Power Integrity Solution is a power integrity and analysis signoff solution that is integrated with the full suite of design implementation and signoff tools of Cadence to deliver the... [[ Click on the title to access the full blog on the Cadence Community site...
Oct 21, 2021
We share AI chip design insights from AI Hardware Summit 2021, including wafer scale AI accelerator chips, high-bandwidth memory interfaces, and custom SoCs. The post 4 Futuristic Design Takeaways from the AI Hardware Summit 2021 appeared first on From Silicon To Software....
Oct 20, 2021
I've seen a lot of things in my time, but I don't think I was ready to see a robot that can walk, fly, ride a skateboard, and balance on a slackline....
Oct 4, 2021
The latest version of Intel® Quartus® Prime software version 21.3 has been released. It introduces many new intuitive features and improvements that make it easier to design with Intel® FPGAs, including the new Intel® Agilex'„¢ FPGAs. These new features and improvements...

featured video

What are V³Link SerDes?

Sponsored by Texas Instruments

V³Link ICs are ultra-low latency SerDes that aggregate video, clock, control and GPIO data into a single-wire bidirectional bridge between industry-standard interfaces. Vision-based designs can use V³Link devices to achieve higher resolution, extend cable reach up to 15 meters and reduce system size, weight and power. Learn about the basics of V³Link technology and explore typical applications for V³Link in this training video.

Click here for more information

featured paper

System-Level Benefits of the Versal Platform

Sponsored by Xilinx

This white paper provides both a qualitative and quantitative analysis of Versal ACAP system-level capabilities for a host of markets ranging from cloud to wired networking and 5G wireless infrastructure. Learn how the Versal architecture delivers best-in-class performance/watt leadership over competing 10nm FPGA architectures in end-applications such as AI compute accelerator, 5G Massive MIMO, network accelerator, smart SSDs, and multi-terabit SmartPHY—supported with data that can be validated with public tools.

Click to read more

featured chalk talk

Thermocouple Temperature Sensor Solution

Sponsored by Mouser Electronics and Microchip

When it comes to temperature monitoring and management, industrial applications can be extremely demanding. With temperatures that can range from 270 to 3000 C, consumer-grade temperature probes just don’t cut it. In this episode of Chalk Talk, Amelia Dalton chats with Ezana Haile of Microchip technology about using thermocouples for temperature monitoring in industrial applications.

More information about Microchip Technology MCP9600, MCP96L00, & MCP96RL00 Thermocouple ICs