When I’m Sixty-Four

Will you still need me / Will you still feed me / When I’m 64?

Ah, ARM has finally grown up. Time to join the big kids at the grownups’ table.

Last week ARM began a long striptease by lifting the veil from its newest CPU architecture. The new design doesn’t even have a name yet, it’s just called “ARM version 8,” or ARMv8 for short. The coming months (and perhaps years) will see a series of ever more-revealing announcements as ARM shows us a bit more of what ARMv8 has to offer. For now, we’ll just have to drool and use our imaginations.

The first ARMv8-based chips are a year away, so it’ll be some time before this new CPU has much of an effect on the market. When they hit, they’ll herald the arrival of ARM to the 64-Bit Club, a group that includes PowerPC, x86, SPARC, MIPS, and others. At long last, ARM will have made the big time.

The company has been working on ARMv8 for four years and talking about it for many years before that. Although ARM is a bit late to the party, it’s not as though the company couldn’t make a 64-bit processor before this—they just didn’t want to. ARM earned its reputation by making small, low-cost, and low-power microprocessors for relatively underpowered (by computer standards) devices. Cell phones, tablets, disk drives, and automotive GPS systems just don’t need 64-bit computing.

But as ARM’s reach extended into “real computers” like servers, the need for 64-bit computing became evident. The Cortex-A15 introduced 64-bit addressing, which helps when you’re making Linux servers that need a lot of virtual memory. But that was largely a stopgap. Real 64-bit computing had to wait for ARMv8.

The new design is a full 64-bit machine, in that it has 64-bit registers, data paths, and addressing. It does not, however, have 64-bit instructions. There’s no need; 4 billion opcodes is quite enough for any rational machine, thank you very much. Except for a few exotic VLIW architectures, no other 64-bit CPU has 64-bit instructions, either.

To make all this 64-bittedness work on ARM’s conventional 32-bit architecture without abandoning backward compatibility, ARMv8 relies on an old trick: mode switching. Future ARMv8 chips will have a 32-bit mode and a 64-bit mode, and you’ll have to explicitly switch back and forth between them. Remember ARM’s original Thumb code-compression mode before Thumb-2 came along? It’s like that.

The 32-bit mode preserves backward compatibility with today’s existing ARM processors but doesn’t enable any of the 64-bit goodness. Switching to 64-bit mode, on the other hand, opens up the new 64-bit register file and other modern features, but it unhinges the CPU from its roots. In other words, old code won’t run in 64-bit mode. You’ll have to switch back and forth in order to mix old code with new. That isn’t a terrible burden (assuming you don’t do it on every subroutine call), and it’s the same way that Intel and others have effected similar overhauls. Starting with the ’386, Intel’s x86 processors have had at least three different operating modes, all to preserve backward compatibility while also enabling new features. Suck it up, soldier.

It’s interesting that ARMv8’s 64-bit mode looks like lot like… every other 64-bit RISC architecture. More specifically, it doesn’t look a whole lot like ARM. Most of ARM’s charming (or quirky) architectural features are gone once you switch out of 32-bit mode. The registers are no longer banked, so no more user and interrupt sets. The inline shift and rotate operators are gone, as are load/store multiple. And conditional execution of instructions is severely curtailed. In other words, all the interesting assembly-language features that made an ARM an ARM are now gone, sacrificed on the altar of regularity and performance. ARM always did have a peculiar instruction set, but those features are (or were) precisely what gave ARM its good code density, too. Shame to see them go.

In their place ARMv8 has an orthogonal set of 32 registers, each 64 bits wide. All the registers are identical (with one exception) and general-purpose; unlike all current ARMs, the stack pointer and link register are now separate registers and not part of the general-purpose set. There’s also a new “zero register,” which is hard-wired to the value of… wait for it… zero. This is another feature shared with many other RISC architectures because it helps to simplify instruction encoding when there’s a constant sources of zeroes.

There is another set of registers used for vector (media) and floating-point operations. Like the general-purpose register set, there are 32 of them, but they can be viewed as either 64 bits or 128 bits wide. The two register sets do not overlap; one is for integers and addresses only, while the other is purely for vector and FP instructions.

(In case you’re wondering, the old shift/rotate, conditional, and other idiosyncratic instructions had to go because those opcode bits were needed to address the larger register files. They were also the source of hardware bottlenecks in most ARM implementations. There’s a reason no other RISC processor had them.)

ARMv8 now sports four privilege levels instead of just two. This is a nod to Intel’s x86 protection rings, also introduced with the ’386 and used extensively by server operating systems ever since. System-level programmers will be able to separate kernel code from driver code, middleware, and applications software. Not all Intel programmers took advantage of the x86’s four privilege levels, and ARM-software vendors may take the same shortcut, but the feature is there for hardy souls who want to use the CPU itself to separate or virtualize their system-level code.

Apart from the above, ARM isn’t saying much about its foray into 64-bittedness. There’s no word on cache sizes, speed, or how many cores the new architecture will support—although it’s a good bet it’ll start out with four and rapidly grow to 16 or more. There’s also no schedule and no names for the many ARMv8-based CPUs that are likely to appear.

For its part, Applied Micro has said it will have an ARMv8-based chip running at 2.5 GHz by this time next year. Called X-Gene, the new chip (or more likely, a family of related chips) will have up to 128 cores, three-level caches, and multiple LAN, WAN, and storage interfaces. Applied Micro’s announcement also suggests that each of the ARMv8 cores is a quad-issue, out-of-order machine, something that ARM didn’t mention. Clearly, ARMv8 is a high-end machine intended for high-end applications. This isn’t your father’s cell phone processor.

In making the switch to 64 bits, ARM essentially rebooted its CPU architecture. In 64-bit mode, tomorrow’s ARMv8 processor looks very little like yesterday’s ARM processor and a whole lot like other RISC processors, especially MIPS. It has lost most of its pleasant peculiarities and adopted most of the Computer Science 101 textbook. Its streamlined instruction set and orthogonal register set look like those of most other high-end CPUs (Intel excepted). That’s not a bad thing; RISC chips are designed that way for good reasons. It makes them fast, scalable, and exploitable by compilers. But in doing so, ARM has lost some of what gave the original series of processors their charm. But charm has no place in 64-bit servers. Performance and power-efficiency are the currency in that market, and now ARM is well-armed to take on the big players. The company has certainly matured.