feature article
Subscribe Now

An Epic Feat of Reverse Engineering

Could You Crack a CPU Design from One Single Program?

“It’s just 1s and 0s. How hard can it be?”

Here’s the ultimate debugging challenge. You’re presented with a string of 20,000 binary bits and told you have to identify it – what it does, how it works, and what machine it’s written for. Unfortunately, you don’t have a disassembler. Or any source code. In fact, you don’t know anything at all about the programming language, instruction set, whether it’s big- or little-endian, or even the word length of the machine. All you get is 1s and 0s. Ready… Go! 

Could you do it? 

If you’re anything like computer science professor Dr. Robert Xiao, you can solve the riddle in about ten hours. This seemingly impossible task was set up as an artificial challenge by Polish security team Dragon Sector to see if anyone could reverse-engineer a custom, completely unknown CPU architecture based on nothing but a single executable binary file. 

The only hints that Xiao and other ambitious challengers got was that the bitstream was a standalone program for a text-adventure game, sort of like the 1970s-era Rogue or Colossal Cave Adventure. Oh, and the processor, which was invented expressly for this challenge and was therefore wholly undocumented, has four registers. Apart from that, you’re on your own. 

As Xiao’s blog details, it was a tedious process, but not without its breakthroughs. A text adventure should have a lot of text, but there’s no point looking for ASCII strings because there’s no guarantee the program uses ASCII encoding. There’s not even any reason to believe it uses 8-bit bytes. 

His first step was somewhat counterintuitive: he dumped the bitstream into a text editor and tried out different line-wrap lengths to eyeball it, hoping to see repeating patterns. Magically, at 20 characters (bits) in width, the 1s and 0s seemed to line up and look less random. There must be a pattern here. Something about the bitstream is divisible by 20. 

Cryptanalysts who crack coded texts often look for patterns or differences in frequency. Certain words, phrases, or letters occur more often than others in human languages. Would the same trick work here? Maybe code loops recur like common phrases. Sure enough, Xiao’s next baby step was to discover a sequence of 425 bits that occurred twice – too long to be coincidental. 

He goes on to describe his method of separating code from data, and to eventually identify opcodes and their operands. But even when you know where the instructions are, that doesn’t tell you anything about what those instructions do. Like Napoleon’s army blowing sand off the Rosetta Stone, you might know you’re looking at writing without having a clue about what it says. 

In the end, Xiao and his debugging partner managed to not only reverse engineer the mystery processor and document the program, but they beat the game without cheating (sort of). You can find their disassembler and the entire game on GitHub

It’s a remarkable feat of debugging and reverse engineering, and it’s both inspiring and discouraging. On the one hand, it shows how resourceful a determined programmer can be, and how tangential skills and techniques can be brought to bear on tough computing problems. 

On the other hand, it makes true system security seem even more impossible. This program was encrypted in a profound way by implementing an imaginary processor with no precedent or documentation. It used 5-bit “bytes” and had its own unique instruction set. Yet a single programmer (with some help) was able to figure it all out in less than a day, with no unusual or expensive tools. Just brains and a deep familiarity with programming, security, and mathematics. If he can crack an unknown CPU, what chance do we have with popular, well-documented machines? “Security through obscurity” got blown apart in a maze of twisty little passages, all alike. 

One thought on “An Epic Feat of Reverse Engineering”

  1. Xiao certainly deserves some serious credit and bragging rights for that one.

    Reverse engineering at this level requires significant experience, that few people ever get a chance to learn.

Leave a Reply

featured blogs
Dec 3, 2021
Hard to believe it's already December and 11/12ths of a year's worth of CFD is behind us. And with the holidays looming, it's uncertain how many more editions of This Week in CFD are... [[ Click on the title to access the full blog on the Cadence Community sit...
Dec 3, 2021
Explore automotive cybersecurity standards, news, and best practices through blog posts from our experts on connected vehicles, automotive SoCs, and more. The post How Do You Stay Ahead of Hackers and Build State-of-the-Art Automotive Cybersecurity? appeared first on From Si...
Dec 3, 2021
Believe it or not, I ran into John (he told me I could call him that) at a small café just a couple of evenings ago as I pen these words....
Nov 8, 2021
Intel® FPGA Technology Day (IFTD) is a free four-day event that will be hosted virtually across the globe in North America, China, Japan, EMEA, and Asia Pacific from December 6-9, 2021. The theme of IFTD 2021 is 'Accelerating a Smart and Connected World.' This virtual event ...

featured video

Imagination Uses Cadence Digital Full Flow for GPU Development

Sponsored by Cadence Design Systems

Learn how Imagination Technologies uses the latest Cadence digital design and simulation solutions to deliver leading-edge GPU technology for automotive, mobile, and data center products.

Click here to learn more about Cadence’s digital design and signoff solutions

featured paper

4 questions to ask before choosing a Wi-SUN stack

Sponsored by Texas Instruments

Scalability, reliability, security, and speed—these are the advantages that the Wireless Smart Ubiquitous Network (Wi-SUN®) offers to smart cities and the Internet of Things. But as a developer, how can you maximize these advantages in your software design? In this article, TI addresses four questions to help you save development cost and get to market faster with a more streamlined design cycle for your IoT application.

Click to read more

featured chalk talk

How Trinamic's Stepper Motor Technologies Improve Your Application

Sponsored by Mouser Electronics and Analog Devices

Stepper motor control has come a long way in the past few years. New techniques can give greater control, smoother operation, greater torque, and better efficiency. In this episode of Chalk Talk, Amelia Dalton chats with Lars Jaskulski about Trinamic stepper solutions and how to take advantage of micro stepping, load measurement, and more.

Click here for more information about Trinamic TMCM-6110 6-Axis Stepper Motor Driver Board