feature article
Subscribe Now

When I’m Sixty-Four

ARM’s New 64-bit CPU Architecture Looks Strangely Old

Will you still need me / Will you still feed me / When I’m 64?

Ah, ARM has finally grown up. Time to join the big kids at the grownups’ table.

Last week ARM began a long striptease by lifting the veil from its newest CPU architecture. The new design doesn’t even have a name yet, it’s just called “ARM version 8,” or ARMv8 for short. The coming months (and perhaps years) will see a series of ever more-revealing announcements as ARM shows us a bit more of what ARMv8 has to offer. For now, we’ll just have to drool and use our imaginations.

The first ARMv8-based chips are a year away, so it’ll be some time before this new CPU has much of an effect on the market. When they hit, they’ll herald the arrival of ARM to the 64-Bit Club, a group that includes PowerPC, x86, SPARC, MIPS, and others. At long last, ARM will have made the big time.

The company has been working on ARMv8 for four years and talking about it for many years before that. Although ARM is a bit late to the party, it’s not as though the company couldn’t make a 64-bit processor before this—they just didn’t want to. ARM earned its reputation by making small, low-cost, and low-power microprocessors for relatively underpowered (by computer standards) devices. Cell phones, tablets, disk drives, and automotive GPS systems just don’t need 64-bit computing.

But as ARM’s reach extended into “real computers” like servers, the need for 64-bit computing became evident. The Cortex-A15 introduced 64-bit addressing, which helps when you’re making Linux servers that need a lot of virtual memory. But that was largely a stopgap. Real 64-bit computing had to wait for ARMv8.

The new design is a full 64-bit machine, in that it has 64-bit registers, data paths, and addressing. It does not, however, have 64-bit instructions. There’s no need; 4 billion opcodes is quite enough for any rational machine, thank you very much. Except for a few exotic VLIW architectures, no other 64-bit CPU has 64-bit instructions, either.

To make all this 64-bittedness work on ARM’s conventional 32-bit architecture without abandoning backward compatibility, ARMv8 relies on an old trick: mode switching. Future ARMv8 chips will have a 32-bit mode and a 64-bit mode, and you’ll have to explicitly switch back and forth between them. Remember ARM’s original Thumb code-compression mode before Thumb-2 came along? It’s like that.

The 32-bit mode preserves backward compatibility with today’s existing ARM processors but doesn’t enable any of the 64-bit goodness. Switching to 64-bit mode, on the other hand, opens up the new 64-bit register file and other modern features, but it unhinges the CPU from its roots. In other words, old code won’t run in 64-bit mode. You’ll have to switch back and forth in order to mix old code with new. That isn’t a terrible burden (assuming you don’t do it on every subroutine call), and it’s the same way that Intel and others have effected similar overhauls. Starting with the ’386, Intel’s x86 processors have had at least three different operating modes, all to preserve backward compatibility while also enabling new features. Suck it up, soldier.

It’s interesting that ARMv8’s 64-bit mode looks like lot like… every other 64-bit RISC architecture. More specifically, it doesn’t look a whole lot like ARM. Most of ARM’s charming (or quirky) architectural features are gone once you switch out of 32-bit mode. The registers are no longer banked, so no more user and interrupt sets. The inline shift and rotate operators are gone, as are load/store multiple. And conditional execution of instructions is severely curtailed. In other words, all the interesting assembly-language features that made an ARM an ARM are now gone, sacrificed on the altar of regularity and performance. ARM always did have a peculiar instruction set, but those features are (or were) precisely what gave ARM its good code density, too. Shame to see them go.

In their place ARMv8 has an orthogonal set of 32 registers, each 64 bits wide. All the registers are identical (with one exception) and general-purpose; unlike all current ARMs, the stack pointer and link register are now separate registers and not part of the general-purpose set. There’s also a new “zero register,” which is hard-wired to the value of… wait for it… zero. This is another feature shared with many other RISC architectures because it helps to simplify instruction encoding when there’s a constant sources of zeroes.

There is another set of registers used for vector (media) and floating-point operations. Like the general-purpose register set, there are 32 of them, but they can be viewed as either 64 bits or 128 bits wide. The two register sets do not overlap; one is for integers and addresses only, while the other is purely for vector and FP instructions.

 (In case you’re wondering, the old shift/rotate, conditional, and other idiosyncratic instructions had to go because those opcode bits were needed to address the larger register files. They were also the source of hardware bottlenecks in most ARM implementations. There’s a reason no other RISC processor had them.)

ARMv8 now sports four privilege levels instead of just two. This is a nod to Intel’s x86 protection rings, also introduced with the ’386 and used extensively by server operating systems ever since. System-level programmers will be able to separate kernel code from driver code, middleware, and applications software. Not all Intel programmers took advantage of the x86’s four privilege levels, and ARM-software vendors may take the same shortcut, but the feature is there for hardy souls who want to use the CPU itself to separate or virtualize their system-level code.

Apart from the above, ARM isn’t saying much about its foray into 64-bittedness. There’s no word on cache sizes, speed, or how many cores the new architecture will support—although it’s a good bet it’ll start out with four and rapidly grow to 16 or more. There’s also no schedule and no names for the many ARMv8-based CPUs that are likely to appear.

For its part, Applied Micro has said it will have an ARMv8-based chip running at 2.5 GHz by this time next year. Called X-Gene, the new chip (or more likely, a family of related chips) will have up to 128 cores, three-level caches, and multiple LAN, WAN, and storage interfaces. Applied Micro’s announcement also suggests that each of the ARMv8 cores is a quad-issue, out-of-order machine, something that ARM didn’t mention. Clearly, ARMv8 is a high-end machine intended for high-end applications. This isn’t your father’s cell phone processor.

In making the switch to 64 bits, ARM essentially rebooted its CPU architecture. In 64-bit mode, tomorrow’s ARMv8 processor looks very little like yesterday’s ARM processor and a whole lot like other RISC processors, especially MIPS. It has lost most of its pleasant peculiarities and adopted most of the Computer Science 101 textbook. Its streamlined instruction set and orthogonal register set look like those of most other high-end CPUs (Intel excepted). That’s not a bad thing; RISC chips are designed that way for good reasons. It makes them fast, scalable, and exploitable by compilers. But in doing so, ARM has lost some of what gave the original series of processors their charm. But charm has no place in 64-bit servers. Performance and power-efficiency are the currency in that market, and now ARM is well-armed to take on the big players. The company has certainly matured. 

Leave a Reply

featured blogs
Jan 26, 2023
Are you experienced in using SVA? It's been around for a long time, and it's tempting to think there's nothing new to learn. Have you ever come across situations where SVA can't solve what appears to be a simple problem? What if you wanted to code an assertion that a signal r...
Jan 24, 2023
We explain embedded magnetoresistive random access memory (eMRAM) and its low-power SoC design applications as a non-volatile memory alternative to SRAM & Flash. The post Why Embedded MRAMs Are the Future for Advanced-Node SoCs appeared first on From Silicon To Software...
Jan 19, 2023
Are you having problems adjusting your watch strap or swapping out your watch battery? If so, I am the bearer of glad tidings....
Jan 16, 2023
By Slava Zhuchenya So your net trace has too much parasitic resistance. Where is it coming from? You ran your… ...

featured video

Synopsys 224G & 112G Ethernet PHY IP OIF Interop at ECOC 2022

Sponsored by Synopsys

This Featured Video shows four demonstrations of the Synopsys 224G and 112G Ethernet PHY IP long and medium reach performance, interoperating with third-party channels and SerDes.

Learn More

featured chalk talk

Solving Design Challenges Using TI's Code Free Sensorless BLDC Motor Drivers

Sponsored by Mouser Electronics and Texas Instruments

Designing systems with Brushless DC motors can present us with a variety of difficult design challenges including motor deceleration, reliable motor startup and hardware complexity. In this episode of Chalk Talk, Vishnu Balaraj from Texas Instruments and Amelia Dalton investigate two new solutions for BLDC motor design that are code free, sensorless and easy to use. They review the features of the MCF8316A and MCT8316A motor drivers and examine how each of these solutions can make your next BLDC design easier than ever before.

Click here for more information about Texas Instruments MCF8361A Sensorless FOC 3-Phase BLDC Driver