feature article
Subscribe Now

Ampere Ups the ARM Ante

Company Tees Up Next Generation of ARM-Based Server Chips

They say there’s no such thing as “the cloud.” It’s just somebody else’s computer. That’s true, but it doesn’t mean that their computer is the same as your computer. Today, most cloud datacenter servers are x86 machines just like your desktop PC except bigger and farther away. But that doesn’t have to be the case. 

Silicon Valley company Ampere Computing thinks that cloud datacenters really should be different from remote PCs, starting with the processor and its instruction set. And today, the company started to lift the veil on its plans to make that happen. 

Ampere’s first-generation Altra processor is already in the market and has been “shipping for revenue since last year,” according to Chief Product Officer Jeff Wittich. It’s about to be joined by the upgraded Altra Max chip, which should enter production in Q3 of this year. Both chips are based on ARM’s Neoverse N1 design running at 3 GHz in TSMC’s 7nm N7 process. 

But Altra and Max are just the warm-up act before Ampere’s second generation of processors debut, possibly by next year. The as-yet-unnamed devices will be based on an entirely new ARM core design that Ampere is designing in-house instead of borrowing from Neoverse. Like Apple and a small handful of other companies, Ampere has been quietly designing its own custom ARM implementations. 

Details are few regarding the new generation, except that it’ll be fabricated in TSMC’s 5nm N5 process and have more than 128 cores and faster memory and I/O compared to Altra, but it will remain fully ARM-compatible. The company isn’t saying if it will base the chip on the recently announced ARMv9 architecture specification. “It’s more nuanced than that,” hints Wittich. 

Ampere is able to design its own ARM-compatible CPU cores thanks to a rare (and expensive) ARM architectural license that it acquired indirectly from AppliedMicro and that company’s X-Gene project. “This is what we’ve been working on for the past three and half years,” says Wittich, pegging the start of CPU development with the founding of the company. In other words, this was their plan all along. 

Having an in-house processor gives Ampere “a more rapid annual cadence” of product introductions than it could have by waiting for ARM’s official rollouts. Ampere says it will add security features to its new core, along with new elements for manageability, telemetry, and resiliency – all things server operators want to see. 

In the meantime, existing Altra customers can look forward to Altra Max later this year. Altra Max ups the core count to 128 (from Altra’s 80). That’s over 50% more processor goodness in the same pin-compatible package. Both run at a solid 3.0 GHz, with no “turbo mode” or variable clock scaling like you’d see on a server-class x86 chip such as Intel’s Xeon or AMD’s Epyc processors. That’s deliberate, and part of what makes Altra different. 

Ampere believes that cloud server workloads are fundamentally different from client PC workloads, starting with the clocking. Servers are shared, and one processor core’s clock frequency shouldn’t affect that of its neighbors. Conventional x86 chips throttle clock speed to remain within a defined thermal envelope, which means a high-demand task running on one CPU core might force a slowdown of the other 31 cores in the same chip. Intel and AMD euphemistically refer to this as turbo mode because it sounds better than don’t-melt-the-chip mode. 

Altra and Altra Max, in contrast, run at a consistent clock rate all the time. In a sense, they’re always in turbo mode and the company says there’s no combination of workloads that will overheat the chips or force a slowdown. Predictability is preserved. 

Ampere’s chips also don’t implement hyperthreading. They’re all single-threaded CPU cores, so the number of cores equals the number of execution threads. That, too, is a nod toward independence and determinism. Server tasks are often broken down into microservices, where multithreading isn’t helpful. It’s more important, says Ampere, that tasks don’t compete for hardware resources or interfere with each other. 

That strategy plays out in the chips’ cache organization, too. Altra and Max both have large L1 and L2 caches, with a comparatively small L3. The last-level cache would be shared among CPU cores (and thus, among tasks), which doesn’t suit the multi-tenancy model of servers. 

The bottom line is that performance scales almost linearly with core count – assuming, of course, that you’re running single-threaded microservices that don’t interact with one another. Ampere hasn’t suddenly found a magical solution to multiprocessor load balancing problems; the company simply focuses its efforts on a subset of tasks that suit its target market. And its chip architecture. 

Wittich points out that users can reduce the processor’s clock frequency if they want to save power, but they never have to. Altra Max operates within the same physical, electrical, and thermal envelopes as Altra, despite having 48 additional CPU cores. At full speed, Altra Max delivers more performance than an x86 processor, or, with the voltage and frequency turned down, it can deliver the same performance for less energy. 

That performance-per-watt ratio has driven a lot of ARM-based server projects… right into the ground. It’s a compelling technical challenge and an attractive market. Who wouldn’t want 1% or 5% of Intel’s lucrative server-processor business? And yet, the failures outnumber the successes by a large irrational number. Ampere may be shipping Altra chips for revenue, but it’s not shipping a whole lot of them for revenue. Ampere’s big-name partners – Microsoft, Oracle, CloudFlare – seem to be kicking the tires, not backing up forklifts loaded with Altra chips. Only one customer, Equinix, has Altra-based servers online and ready for the average Joe to use. But hey, you gotta start somewhere.  

The market for PC processors started out with one or two dominant vendors, and then it had a brief period with a lot of startup competitors, then went back to one or two dominant vendors. Maybe Ampere is right. Maybe the cloud server market really will be different.

Leave a Reply

featured blogs
Nov 23, 2022
The current challenge in custom/mixed-signal design is to have a fast and silicon-accurate methodology. In this blog series, we are exploring the Custom IC Design Flow and Methodology stages. This methodology directly addresses the primary challenge of predictability in creat...
Nov 22, 2022
Learn how analog and mixed-signal (AMS) verification technology, which we developed as part of DARPA's POSH and ERI programs, emulates analog designs. The post What's Driving the World's First Analog and Mixed-Signal Emulation Technology? appeared first on From Silicon To So...
Nov 21, 2022
By Hossam Sarhan With the growing complexity of system-on-chip designs and technology scaling, multiple power domains are needed to optimize… ...
Nov 18, 2022
This bodacious beauty is better equipped than my car, with 360-degree collision avoidance sensors, party lights, and a backup camera, to name but a few....

featured video

How to Harness the Massive Amounts of Design Data Generated with Every Project

Sponsored by Cadence Design Systems

Long gone are the days where engineers imported text-based reports into spreadsheets and sorted the columns to extract useful information. Introducing the Cadence Joint Enterprise Data and AI (JedAI) platform created from the ground up for EDA data such as waveforms, workflows, RTL netlists, and more. Using Cadence JedAI, engineering teams can visualize the data and trends and implement practical design strategies across the entire SoC design for improved productivity and quality of results.

Learn More

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

56 Gbps PAM4 Performance in FPGA Applications

Sponsored by Mouser Electronics and Samtec

If you are working on an FPGA design, the choice of a connector solution can be a crucial element in your system design. Your FPGA connector solution needs to support the highest of speeds, small form factors, and emerging architectures. In this episode of Chalk Talk, Amelia Dalton joins Matthew Burns to chat about you can get 56 Gbps PAM4 performance in your next FPGA application. We take a closer look at Samtec’s AcceleRate® HD High-Density Arrays, the details of Samtec’s Flyover Technology, and why Samtec’s complete portfolio of high-performance interconnects are a perfect fit for 56 Gbps PAM4 FPGA Applications.

Click here for more information about Samtec AcceleRate® Slim Body Direct Attach Cable Assembly