feature article
Subscribe Now

Clever Hack Finds Mystery CPU Instructions

Enterprising Programmer Uses x86 Microcode to Reveal Itself

In the 1966 movie Fantastic Voyage, a team of doctors and scientists gets miniaturized and injected into the bloodstream of a human patient. They and their yellow submarine navigate past heart valves, battle corpuscles, and swim in tear ducts. It provides an inside look into biological workings most of us never see. 

An enterprising Hungarian engineer, Can Bölük at Verilave, has done something similar, probing the obscure nervous system of Intel’s fabulously complex x86 microprocessors. He’s probed for unused and undocumented opcodes in the chips’ instruction sets. And he’s uncovered quite a bit. His code is available on Github

If you’ve ever programmed an x86 chip in assembly language, your first thought might be, “I didn’t know there were any unused opcodes.” After all, the architecture has a spectacularly complex and richly well-appointed instruction set that’s been growing steadily since the 1970s. Surely every instruction has been done by now? The opcode map must be full to overflowing? 

Nope. There are still plenty of holes in the x86 opcode map, and not all of them are actually empty. Some are just… undocumented. Some instructions appear to be deeply buried maintenance functions while others look like Easter eggs, forgotten bugs, or partially implemented instructions that never quite saw the light of day. A few seem to provide remarkably godlike powers that appear to circumvent on-chip security or even to rewrite the chip’s internal microcode

Bölük didn’t find all of these omissions himself, but he did automate the process of uncovering them, and his approach is as remarkable as it is nonintuitive. It’s a cleverly crafted program that relies on an unlikely ally: speculative execution. 

For starters, finding unused opcodes is a lot harder than it might sound. You can’t just stuff memory with every value from 0x00 through 0xFF, execute it, and watch what happens. That’s especially true of x86 processors, which have variable-length instructions and a convoluted system of code prefix bytes, suffix bytes, modes, and internal register designations. Instructions that work with some registers don’t work with others. Some memory modes are supported by some instructions but not others, and so on. It’s unabashedly nonorthogonal. 

The same instruction (that is, the same binary encoding) can be interpreted and executed different ways depending on whether the chip is operating in 16-bit mode, 32-bit mode, or 64-bit mode. It can depend on privilege level (CPL0 through CPL3), the number of CPU cores, or the MMU configuration. A given two-byte sequence might not work, but a three-byte variation will. An instruction might work only when it’s preceded by a certain instruction or followed by another instruction. CISC? Yeah, you could say it’s a complex instruction set. 

On top of all that, an instruction might be legal but not observable. How do you know a random instruction actually did anything? NOP is a legitimate instruction but has no effect (that’s the point). The CLI instruction disables interrupts, but how do you know? 

Conversely, some instructions will have too much of an effect and ruin your exploration. The processor might crash, or counters get reset, or memory wiped out. What if it halts and catches fire? Randomly plugging in opcodes to see what happens is like defusing a bomb by cutting random wires. 

Instead, Bölük took a methodical approach. He doesn’t ask the chip to actually execute any suspected hidden instructions. He just wants it to fetch them. Then, he can tell what’s a real instruction and what’s not by observing some subtle side effects. 

His trick relies on speculative execution after a conditional branch. As we know, all modern processors try to eke out extra performance by executing a handful of instructions immediately after a branch instruction. Once the branch is resolved, if the chip guessed correctly, then that’s all free work. If the chip guessed incorrectly, all the speculative work is discarded, and any results are canceled as if they’d never happened. 

That means that if you can convince the processor to execute a mystery instruction speculatively, there should be no side effects. If it’s an invalid instruction, the chip won’t generate the illegal instruction fault. But if it is a valid instruction, it won’t affect the chip’s behavior or do random and unexpected things. Either way, you’ve tested an unknown instruction without causing problems. It’s the perfect dry run. 

Enter Schrödinger’s cat. If there are no observable effects one way or the other, how do you tell a valid but unknown instruction from an invalid one? That’s where some little-known performance counters inside the chip come into play. 

Intel’s x86 chips don’t actually execute the x86 instruction set we all know. Instead, they convert the familiar assembly-level instructions (MOV, PUSH, XCHG, STOSB, FXTRACT, et al.) into even more rudimentary micro-instructions. Yes, Virginia, inside every CISC processor is a RISC processor trying to get out. It’s been this way for more than 20 years. AMD’s chips work the same way, although their internal micro-instructions are different. 

The conversion from complex assembly instructions to simpler micro-instructions is controlled by an internal microcode engine. In fact, there are two of them, one for relatively simple instructions and one for the really complex operations. This is something that even die-hard assembly coders never see and don’t care about. But, if you dig around enough, you’ll find that there are a couple of internal CPU status registers that report on the activity of these sequencers. And that can tell you a bit about what the CPU is trying to execute – even if it later flushes and disregards the results. Voila! A peephole into the mysterious realm of undocumented x86 opcodes. 

Bölük details how his program skips over known x86 instructions and known interactions among those instructions. He also expends a lot of effort removing measurement artifacts. After all that, he discovered more than 50 previously undefined opcodes. He also discovered some unanticipated side effects with some instructions, such as the fact that loading CPU control register CR2 doesn’t serialize as expected (that is, it doesn’t pause speculative execution).  

Bölük’s method is a riff on the Spectre and Meltdown bugs, as well as AMD’s recently disclosed PSF vulnerability, but for a good cause. They all leverage speculative execution to force subtle but observable side effects within the processor. More evidence that every action has an equal and opposite reaction.

3 thoughts on “Clever Hack Finds Mystery CPU Instructions”

Leave a Reply

featured blogs
May 12, 2021
The ICADVM20.1 ISR18 and IC6.1.8 ISR18 production releases are now available for download at Cadence Downloads . For information on supported platforms and other release compatibility information,... [[ Click on the title to access the full blog on the Cadence Community site...
May 11, 2021
Human vision in indispensable and often taken for granted. Similarly machine, or embedded, vision influences daily human life in ways thought impossible. Simply, machine vision refers to the ability of embedded systems to “see”. Key system components include camer...
May 6, 2021
Learn how correct-by-construction coding enables a more productive chip design process, as new code review tools address bugs early in the design process. The post Find Bugs Earlier Via On-the-Fly Code Checking for Productive Chip Design and Verification appeared first on Fr...
May 4, 2021
What a difference a year can make! Oh, we're not referring to that virus that… The post Realize Live + U2U: Side by Side appeared first on Design with Calibre....

featured video

Introduction to EMI

Sponsored by Texas Instruments

Conducted versus radiated EMI. CISPR-25 and CISPR-32 standards. High-frequency or low-frequency emissions. Designing a system to reduce EMI can be overwhelming, but it doesn’t have to be. Watch this video to get an overview of EMI causes, standards, and mitigation techniques.

Click here for more information

featured paper

E-book: An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click to download e-book

Featured Chalk Talk

Bluetooth Overview

Sponsored by Mouser Electronics and Silicon Labs

Bluetooth has come a long way in recent years, and adding the latest Bluetooth features to your next design is easier than ever. It’s time to ditch the cables and go wireless. In this episode of Chalk Talk, Amelia Dalton chats with Mark Beecham of Silicon labs about the latest Bluetooth capabilities including lower power, higher bandwidth, mesh, and more, as well as solutions that will make adding Bluetooth to your next design a snap.

Click here for more information about Silicon Labs EFR32BG Blue Gecko Wireless SoCs