feature article
Subscribe Now

Ampere Ups the ARM Ante

Company Tees Up Next Generation of ARM-Based Server Chips

They say there’s no such thing as “the cloud.” It’s just somebody else’s computer. That’s true, but it doesn’t mean that their computer is the same as your computer. Today, most cloud datacenter servers are x86 machines just like your desktop PC except bigger and farther away. But that doesn’t have to be the case. 

Silicon Valley company Ampere Computing thinks that cloud datacenters really should be different from remote PCs, starting with the processor and its instruction set. And today, the company started to lift the veil on its plans to make that happen. 

Ampere’s first-generation Altra processor is already in the market and has been “shipping for revenue since last year,” according to Chief Product Officer Jeff Wittich. It’s about to be joined by the upgraded Altra Max chip, which should enter production in Q3 of this year. Both chips are based on ARM’s Neoverse N1 design running at 3 GHz in TSMC’s 7nm N7 process. 

But Altra and Max are just the warm-up act before Ampere’s second generation of processors debut, possibly by next year. The as-yet-unnamed devices will be based on an entirely new ARM core design that Ampere is designing in-house instead of borrowing from Neoverse. Like Apple and a small handful of other companies, Ampere has been quietly designing its own custom ARM implementations. 

Details are few regarding the new generation, except that it’ll be fabricated in TSMC’s 5nm N5 process and have more than 128 cores and faster memory and I/O compared to Altra, but it will remain fully ARM-compatible. The company isn’t saying if it will base the chip on the recently announced ARMv9 architecture specification. “It’s more nuanced than that,” hints Wittich. 

Ampere is able to design its own ARM-compatible CPU cores thanks to a rare (and expensive) ARM architectural license that it acquired indirectly from AppliedMicro and that company’s X-Gene project. “This is what we’ve been working on for the past three and half years,” says Wittich, pegging the start of CPU development with the founding of the company. In other words, this was their plan all along. 

Having an in-house processor gives Ampere “a more rapid annual cadence” of product introductions than it could have by waiting for ARM’s official rollouts. Ampere says it will add security features to its new core, along with new elements for manageability, telemetry, and resiliency – all things server operators want to see. 

In the meantime, existing Altra customers can look forward to Altra Max later this year. Altra Max ups the core count to 128 (from Altra’s 80). That’s over 50% more processor goodness in the same pin-compatible package. Both run at a solid 3.0 GHz, with no “turbo mode” or variable clock scaling like you’d see on a server-class x86 chip such as Intel’s Xeon or AMD’s Epyc processors. That’s deliberate, and part of what makes Altra different. 

Ampere believes that cloud server workloads are fundamentally different from client PC workloads, starting with the clocking. Servers are shared, and one processor core’s clock frequency shouldn’t affect that of its neighbors. Conventional x86 chips throttle clock speed to remain within a defined thermal envelope, which means a high-demand task running on one CPU core might force a slowdown of the other 31 cores in the same chip. Intel and AMD euphemistically refer to this as turbo mode because it sounds better than don’t-melt-the-chip mode. 

Altra and Altra Max, in contrast, run at a consistent clock rate all the time. In a sense, they’re always in turbo mode and the company says there’s no combination of workloads that will overheat the chips or force a slowdown. Predictability is preserved. 

Ampere’s chips also don’t implement hyperthreading. They’re all single-threaded CPU cores, so the number of cores equals the number of execution threads. That, too, is a nod toward independence and determinism. Server tasks are often broken down into microservices, where multithreading isn’t helpful. It’s more important, says Ampere, that tasks don’t compete for hardware resources or interfere with each other. 

That strategy plays out in the chips’ cache organization, too. Altra and Max both have large L1 and L2 caches, with a comparatively small L3. The last-level cache would be shared among CPU cores (and thus, among tasks), which doesn’t suit the multi-tenancy model of servers. 

The bottom line is that performance scales almost linearly with core count – assuming, of course, that you’re running single-threaded microservices that don’t interact with one another. Ampere hasn’t suddenly found a magical solution to multiprocessor load balancing problems; the company simply focuses its efforts on a subset of tasks that suit its target market. And its chip architecture. 

Wittich points out that users can reduce the processor’s clock frequency if they want to save power, but they never have to. Altra Max operates within the same physical, electrical, and thermal envelopes as Altra, despite having 48 additional CPU cores. At full speed, Altra Max delivers more performance than an x86 processor, or, with the voltage and frequency turned down, it can deliver the same performance for less energy. 

That performance-per-watt ratio has driven a lot of ARM-based server projects… right into the ground. It’s a compelling technical challenge and an attractive market. Who wouldn’t want 1% or 5% of Intel’s lucrative server-processor business? And yet, the failures outnumber the successes by a large irrational number. Ampere may be shipping Altra chips for revenue, but it’s not shipping a whole lot of them for revenue. Ampere’s big-name partners – Microsoft, Oracle, CloudFlare – seem to be kicking the tires, not backing up forklifts loaded with Altra chips. Only one customer, Equinix, has Altra-based servers online and ready for the average Joe to use. But hey, you gotta start somewhere.  

The market for PC processors started out with one or two dominant vendors, and then it had a brief period with a lot of startup competitors, then went back to one or two dominant vendors. Maybe Ampere is right. Maybe the cloud server market really will be different.

Leave a Reply

featured blogs
Jan 19, 2022
Explore the importance of system interoperability in hyperscale data centers and why it matters for AI and high-performance computing (HPC) applications. The post Why Interoperability Matters for High-Performance Computing and AI Chip Designs appeared first on From Silicon T...
Jan 19, 2022
2001 was famous for some of the worst security issues (accompanied by obligatory picture of bad guy in a black hoodie): The very first blog post of the year covered SolarWinds. See my post The... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Jan 18, 2022
This column should more properly be titled 'Danny MacAskill Meets Elvis Presley Meets Bollywood Meets Cultural Appropriation,' but I can't spell '˜appropriation.'...

featured video

Synopsys & Samtec: Successful 112G PAM-4 System Interoperability

Sponsored by Synopsys

This Supercomputing Conference demo shows a seamless interoperability between Synopsys' DesignWare 112G Ethernet PHY IP and Samtec's NovaRay IO and cable assembly. The demo shows excellent performance, BER at 1e-08 and total insertion loss of 37dB. Synopsys and Samtec are enabling the industry with a complete 112G PAM-4 system, which is essential for high-performance computing.

Click here for more information about DesignWare Ethernet IP Solutions

featured paper

nanoPower Module Extends Battery Life in Space-Constrained Applications

Sponsored by Analog Devices

Designers can now increase battery life and reduce size in space-constrained IoT devices with a power module that features the lowest quiescent current compared to competitive solutions and uSLIC built-in inductor technology that reduces solution size by up to 37%.

Read Now

featured chalk talk

Single Pair Ethernet : Simplifying IIoT & Automation

Sponsored by Mouser Electronics and Analog Devices

Industry 4.0 with its variety of sensing solutions and fieldbus systems can make communication pretty tricky but single pair ethernet can change all of that. In this episode of Chalk, Amelia Dalton chats with representatives from three different companies: Analog Devices, HARTING and Würth Elektronik to discuss the benefits of single pair Ethernet, what the new IEEE standard means to SPE designs, and what you should consider when working on your next single pair Ethernet design.

Click here for more information about Single Pair Ethernet solutions from Analog Devices, HARTING and Würth Elektronik