“Syntactic sugar causes cancer of the semi-colons.” – anonymous
Obfuscated code is normally considered a bad thing. Plenty of us write unintelligible code by accident, but, as a rule, we’re supposed to write code that’s clear, understandable, and maintainable. Clarity of purpose is a mark of good digital hygiene.
But that goes out the window if you’re a security expert. In the crypto world, you want obfuscated code. You want to make it hard – ideally, impossible – for outsiders to figure out what your program is doing, or how. Obfuscation is a goal, not an impediment.
The ultimate goal is what’s called “black box obfuscation,” where it’s impossible to learn anything whatsoever about a program except what it explicitly reveals to you. To give a trivial example: if a program takes in two integers and spits out their sum, you can easily figure out that it’s adding them together, but, if it’s a perfect black box, you could never tell how it’s accomplishing that task. No amount of code analysis, disassembly, tracing, or side-channel observation would illuminate its inner workings.
More practically, programs that handle sensitive information (financial transactions, military secrets, etc.) should be black boxes. If there’s no way to tease out their program structure, then there’s no way to circumvent their operation. There’s no attack surface. It’s the ultimate security through obscurity.
Sadly, the ideal black box doesn’t seem possible. Plenty of researchers have tried, and nearly succeeded, but, as it stands now, it appears that you can’t make a program that works reliably on a computer but that is also utterly mysterious to humans. There’s always something you can glean by watching and probing.
But maybe it doesn’t matter.
A research team recently published a paper that purports to show how “indistinguishability obfuscation” works just as well. Under indistinguishability obfuscation, a program is indistinguishable from other programs that perform the same task. That is, if you obfuscate the source code from two similar programs, you can no longer tell which source belongs to which binary. That’s useful, because it means you can do things like hard-code sensitive information (passwords, keys, compromising photos, etc.) inside a program with no danger of it being extracted.
It also follows that, if programs can be obfuscated in this way, then other security-related programs become even more secure because they can’t be reverse-engineered. Indistinguishability obfuscation becomes the key, as it were, to a treasure chest full of other security improvements.
Obfuscating code isn’t a new idea. In fact, it can be fun, even competitive. The International Obfuscated C Code Contest has been going on for 27 years. Software obfuscation is possible because computer programs are written in a human-readable source language (C, C++, Java, Pascal, Python, Perl, etc.) that is then translated (compiled) into a binary notation that’s understandable to computers. Nearly all language compilers are very particular about spelling, punctuation, and capitalization, but they ignore the niceties of spacing. That makes it possible – indeed, kind of a fun challenge – to write programs that look terrible to human eyes but that compile perfectly well. With a little effort, you can write completely inscrutable programs that nonetheless work just fine. Taken to extremes, that indecipherability becomes an asset, not a failing.
But you can’t rely on human creativity and ingenuity to make programs truly unintelligible. To do that, you need a provable mathematical process and some sort of automated tool to correctly apply those principles. That’s what the authors of the paper think they’ve discovered. Among other things, they rely on well-accepted assumptions regarding the difficulty of cracking certain algorithms, as opposed to some earlier attempts at obfuscation that hand-waved new mathematical ideas.
It’s interesting to contrast the implied goal of all computer programming, which is to convert something easily understandable to humans (source code) into something easily understandable by computers (object code), versus this new goal of breaking that connection. It truly turns programming into coding.
Or we could just program everything in Forth. It’s been a write-only language since the beginning.