“You have to fight against being an antique.” – Burt Lancaster
Constant improvement brings constant obsolescence. When we keep updating our operating systems, processors, DRAM, and applications, we also leave behind the old apps and data that we used to use.
Normally, that’s not a huge problem. Most programs are backward-compatible with their predecessors. You can open dusty old Word documents using the latest version of Office or edit old Photoshop images with the newest release from Adobe.
But not always. Once-popular programs like WordStar, VisiCalc, Lotus 1-2-3, and the Mosaic web browser aren’t around anymore. There’s no iOS or Android version, nor any current MacOS or Windows releases. And what about applications for defunct operating systems like BeOS, Pick, CP/M, MP/M, Palm OS, DOS 3.1, FlexOS, and a thousand others? Now you’ve got a real problem.
Worse still, the hardware those programs used to run on may also be gone. It’s hard to find a (working) NeXT cube, Apple ][, PDP-11, or early Sun workstation. Time and progress take their toll, and old computers have the staying power of old bread.
All of this presents a problem if you’re trying to retrieve old data from floppy disks, or if you need to recreate someone’s engineering work from years ago. Admit it: we’ve all kept an old computer sitting around in the lab because it’s the only way to run the old compiler or the old EDA software. Accounting departments hide away “Barb’s old computer” because it’s got the magical database that no one else can duplicate. Military contractors routinely buy obsolete parts from companies that specialize in old technology to repair 1980s equipment.
Lawyers, researchers, engineers, archivists, and others often need to resurrect old data files from outdated machines running obsolete applications on long-dead operating systems. When every layer of the computing stack is gone, what do you do?
We can’t make old hardware last forever, but we should be able to keep old programs and data alive. It’s all just 1’s and 0’s, right? How hard can it be?
Harder than it sounds. Physical media like floppy disks and tape cassettes deteriorate over time, and there’s no good way to prevent that. But once the data is transferred to a more modern medium like a hard drive, its life can be extended more or less indefinitely. It’s also easy to make multiple copies of the data (i.e., backups) and to share it globally. That’s the easy part.
The hard part is making sense of the data once you’ve saved it. Databases from the 1980s often stored records in weird and inscrutable ways. Just reading the bits doesn’t tell us much about the data they represent, or how it’s supposed to be organized, sorted, or grouped. Stored images might be unreadable if the file format has changed. Not everything was JPG or PNG back then.
Recompiling those old programs for modern machines often isn’t an option. The source code might be gone (or be written in a weird language). Compilers that understand it might be gone. The operating system APIs it calls on might be gone. We’ve all discovered that a “simple recompile” is never simple.
Recreating those programs from scratch raises different problems. Even if you did understand the underlying structure of the data file (database, compressed image, audio sample, etc.), do you really want to try to reconstruct the program that created it? Could you duplicate Lotus 1-2-3 or Aldus PageMaker exactly as they worked in 1987? Good luck with that. Older applications had bugs – surprise! – and those would need to be recreated accurately, too. Some bugs were even deliberate, to take advantage of quirks in the host operating system, undocumented features of the compiler, or peculiarities in a disk controller. Early x86 programs routinely abused the processor’s MMU to exceed legal memory limitations.
The only way to reliably retrieve old data is to use the actual program that created it, running on the actual operating system it used at the time. Ideally, everything would run on authentic hardware from the era, too, but that last part’s impossible to do.
So, a small group at Carnegie Mellon University did the next best thing. They created OLIVE: Open Library of Images for Virtualized Execution. It’s a complete software stack of virtualized machines, starting from the microprocessor all the way up to the applications and the data they handled. The idea is to emulate old computers accurately enough that they run old software unmodified. Then there’s no quibbling over whether the applications and their data are correct or not. It’s the real thing.
OLIVE runs on an x86 Linux machine, to which the team has added its own hardware-abstraction layer, a hypervisor, and various guest operating systems in their native form. Currently, OLIVE supports 17 different CPU/OS environments, including 68040-based Macintoshes, an Apple ][, and PCs running DOS or Windows 3.1, among others. The emulated systems can run whatever applications they would have supported at the time, including old games, browsers, or GUI extensions like Windows 3.1 (before Windows became its own operating system).
Old PC code executes natively on the underlying x86 hardware, while Apple ][ and 68K-based Macintosh software is emulated. That’s not too different from what Apple itself did when switching from 68K to the PowerPC, then to x86, beginning in 1994. Emulating a processor is slow, but twenty-some years of CPU progress more than makes up for the shortfall. A modern Core i7 has no problem emulating an old 6502 or a 33-MHz 68040 from the mid-90s.
Predictably, there’s a legal snag. Most software is covered under copyright law, and distributing old programs isn’t strictly legal. It doesn’t matter whether the original software vendor still exists or not; their legal rights persist for 90 years after the program was published. (That’s different from the life-plus-70 years for individual publishers.) That means OLIVE and the programs it now runs are available only to Carnegie Mellon’s own research team. If you need an old data file resurrected, you’ll have to ask them to do it for you. But that’s better than leaving the data to quietly fade away into oblivion.