feature article
Subscribe Now

Clever Hack Finds Mystery CPU Instructions

Enterprising Programmer Uses x86 Microcode to Reveal Itself

In the 1966 movie Fantastic Voyage, a team of doctors and scientists gets miniaturized and injected into the bloodstream of a human patient. They and their yellow submarine navigate past heart valves, battle corpuscles, and swim in tear ducts. It provides an inside look into biological workings most of us never see. 

An enterprising Hungarian engineer, Can Bölük at Verilave, has done something similar, probing the obscure nervous system of Intel’s fabulously complex x86 microprocessors. He’s probed for unused and undocumented opcodes in the chips’ instruction sets. And he’s uncovered quite a bit. His code is available on Github

If you’ve ever programmed an x86 chip in assembly language, your first thought might be, “I didn’t know there were any unused opcodes.” After all, the architecture has a spectacularly complex and richly well-appointed instruction set that’s been growing steadily since the 1970s. Surely every instruction has been done by now? The opcode map must be full to overflowing? 

Nope. There are still plenty of holes in the x86 opcode map, and not all of them are actually empty. Some are just… undocumented. Some instructions appear to be deeply buried maintenance functions while others look like Easter eggs, forgotten bugs, or partially implemented instructions that never quite saw the light of day. A few seem to provide remarkably godlike powers that appear to circumvent on-chip security or even to rewrite the chip’s internal microcode

Bölük didn’t find all of these omissions himself, but he did automate the process of uncovering them, and his approach is as remarkable as it is nonintuitive. It’s a cleverly crafted program that relies on an unlikely ally: speculative execution. 

For starters, finding unused opcodes is a lot harder than it might sound. You can’t just stuff memory with every value from 0x00 through 0xFF, execute it, and watch what happens. That’s especially true of x86 processors, which have variable-length instructions and a convoluted system of code prefix bytes, suffix bytes, modes, and internal register designations. Instructions that work with some registers don’t work with others. Some memory modes are supported by some instructions but not others, and so on. It’s unabashedly nonorthogonal. 

The same instruction (that is, the same binary encoding) can be interpreted and executed different ways depending on whether the chip is operating in 16-bit mode, 32-bit mode, or 64-bit mode. It can depend on privilege level (CPL0 through CPL3), the number of CPU cores, or the MMU configuration. A given two-byte sequence might not work, but a three-byte variation will. An instruction might work only when it’s preceded by a certain instruction or followed by another instruction. CISC? Yeah, you could say it’s a complex instruction set. 

On top of all that, an instruction might be legal but not observable. How do you know a random instruction actually did anything? NOP is a legitimate instruction but has no effect (that’s the point). The CLI instruction disables interrupts, but how do you know? 

Conversely, some instructions will have too much of an effect and ruin your exploration. The processor might crash, or counters get reset, or memory wiped out. What if it halts and catches fire? Randomly plugging in opcodes to see what happens is like defusing a bomb by cutting random wires. 

Instead, Bölük took a methodical approach. He doesn’t ask the chip to actually execute any suspected hidden instructions. He just wants it to fetch them. Then, he can tell what’s a real instruction and what’s not by observing some subtle side effects. 

His trick relies on speculative execution after a conditional branch. As we know, all modern processors try to eke out extra performance by executing a handful of instructions immediately after a branch instruction. Once the branch is resolved, if the chip guessed correctly, then that’s all free work. If the chip guessed incorrectly, all the speculative work is discarded, and any results are canceled as if they’d never happened. 

That means that if you can convince the processor to execute a mystery instruction speculatively, there should be no side effects. If it’s an invalid instruction, the chip won’t generate the illegal instruction fault. But if it is a valid instruction, it won’t affect the chip’s behavior or do random and unexpected things. Either way, you’ve tested an unknown instruction without causing problems. It’s the perfect dry run. 

Enter Schrödinger’s cat. If there are no observable effects one way or the other, how do you tell a valid but unknown instruction from an invalid one? That’s where some little-known performance counters inside the chip come into play. 

Intel’s x86 chips don’t actually execute the x86 instruction set we all know. Instead, they convert the familiar assembly-level instructions (MOV, PUSH, XCHG, STOSB, FXTRACT, et al.) into even more rudimentary micro-instructions. Yes, Virginia, inside every CISC processor is a RISC processor trying to get out. It’s been this way for more than 20 years. AMD’s chips work the same way, although their internal micro-instructions are different. 

The conversion from complex assembly instructions to simpler micro-instructions is controlled by an internal microcode engine. In fact, there are two of them, one for relatively simple instructions and one for the really complex operations. This is something that even die-hard assembly coders never see and don’t care about. But, if you dig around enough, you’ll find that there are a couple of internal CPU status registers that report on the activity of these sequencers. And that can tell you a bit about what the CPU is trying to execute – even if it later flushes and disregards the results. Voila! A peephole into the mysterious realm of undocumented x86 opcodes. 

Bölük details how his program skips over known x86 instructions and known interactions among those instructions. He also expends a lot of effort removing measurement artifacts. After all that, he discovered more than 50 previously undefined opcodes. He also discovered some unanticipated side effects with some instructions, such as the fact that loading CPU control register CR2 doesn’t serialize as expected (that is, it doesn’t pause speculative execution).  

Bölük’s method is a riff on the Spectre and Meltdown bugs, as well as AMD’s recently disclosed PSF vulnerability, but for a good cause. They all leverage speculative execution to force subtle but observable side effects within the processor. More evidence that every action has an equal and opposite reaction.

3 thoughts on “Clever Hack Finds Mystery CPU Instructions”

Leave a Reply

featured blogs
May 26, 2022
Introducing Synopsys Learning Center, an online, on-demand library of self-paced training modules, webinars, and labs designed for both new & experienced users. The post New Synopsys Learning Center Makes Training Easier and More Accessible appeared first on From Silico...
May 26, 2022
CadenceLIVE Silicon Valley is back as an in-person event for 2022, in the Santa Clara Convention Center as usual. The event will take place on Wednesday, June 8 and Thursday, June 9. Vaccination You... ...
May 25, 2022
There are so many cool STEM (science, technology, engineering, and math) toys available these days, and I want them all!...
May 24, 2022
By Neel Natekar Radio frequency (RF) circuitry is an essential component of many of the critical applications we now rely… ...

featured video

Increasing Semiconductor Predictability in an Unpredictable World

Sponsored by Synopsys

SLM presents significant value-driven opportunities for assessing the reliability and resilience of silicon devices, from data gathered during design, manufacture, test, and in-field. Silicon data driven analytics provide new actionable insights to address the challenges posed to large scale silicon designs.

Learn More

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Flexible Power for a Smart World

Sponsored by Mouser Electronics and CUI Inc.

Safety, EMC compliance, your project schedule, and your BOM cost are all important factors when you are considering what power supply you will need for your next design. You also need to think about form factor, which capacitor will work best, and more. But if you’re not a power supply expert, this can get overwhelming in a hurry. In this episode of Chalk Talk, Amelia Dalton chats with Ron Stull from CUI Inc. about CUI PBO Single Output Board Mount AC-DC Power Supplies, what this ac/dc core brings to the table in terms of form factor, reliability and performance, and why this kind of solution may give you the flexibility you need to optimize your next design.

Click here for more information about CUI Inc PBO Single Output Board Mount AC-DC Power Supplies