feature article
Subscribe Now

ARM’s Cortex-A15 “Eagle” Has Landed

You know who you are. You’re one of the legions of ARM programmers, engineers, and developers. You made ARM the most popular 32-bit processor on the planet—eclipsing even Intel. You use an ARM-based cell phone, you listen to your ARM-based iPod, you spin up ARM-based disk drives… admit it. You’re part of the ARM army.

Well, good news, campers. The latest, greatest, fastest, most wonderful-est ARM processor in the world just got announced today. It’s the tippy-top of ARM’s broad family tree, surpassing even the multicore Cortex-A9. Behold the Cortex-A15. Look upon it and be amazed.

Okay, maybe the A15 isn’t that big a deal. Yes, it’s a sophisticated and advanced 32-bit processor design, and it’s clearly the best work that ARM has ever done. But to be honest… it’s a lot like other 32-bit designs from other CPU companies. The big deal is that it’s the most-advanced CPU from ARM. It’s just not the most-advanced CPU ever.

What’s She Got Under the Hood?

By any measure, the Cortex-A15 (which was code-named Eagle in development) is an impressive piece of work. It’s a multicore, superscalar, out-of-order, 32-bit machine with extensive branch prediction, virtualization, register renaming, parallel execution units, and all the other bells and whistles you could want. When the first A15-based chips hit the street next year, they should hum along at about 1.5GHz and may hit 2.5GHz with a bit of a tailwind. The A15 can easily support up to eight CPU cores, and ARM hints that support for 32 cores and more might be just around the corner. Clearly, this is a big processor for big tasks.

This is not your father’s low-power ARM processor. Cortex-A15 isn’t for cell phones or iPads. It’s intended to take on big server chips, network chips, and communications processors where Freescale, Intel, Cavium, NetLogic, Marvell, and other comms-related companies play today. Forget what you remember about the cute little ARM7. Cortex-A15 moves ARM into the world of big iron.

And that’s really a double-edged sword. The A15 looks like it has the performance wherewithal to duke it out with the big boys from MIPS or PowerPC or Intel. But the A15 also gives up (or at least, downplays) the traditional ARM advantages of low power, small die area, and simple programming. To make a big and powerful processor, ARM had to… make a big and powerful processor. Let me show you what I mean.

For starters, A15 has a massively long 24-stage pipeline. Fully half of that—12 pipeline stages—is just for fetching and decoding instructions. That’s a heck of a long time before instructions even start to execute. The execution units (there are eight of them) take 3–12 additional pipeline stages, for a total of up to 24. There are two “simple” execution units for basic ARM instructions, two for multimedia and floating-point instructions, two for loads and stores, one for multiplication and division, and one execution unit for handling branches. All of these can run in parallel, though it would be an unusual piece of code that kept them all busy at once.

The long pipeline is necessary to enable the high clock rates; high frequencies mean short periods, after all. And since cache memories aren’t getting much faster these days, it takes more cycles to fetch code and data from those sluggish SRAMs. The downside of a long pipeline is the penalty you pay every time you have to flush it and start over. In other words, branches derail this high-speed train.

Every branch that’s taken forces the A15 to flush and reload its long instruction pipeline. Nothing unusual about that; every processor in the world does the same thing. To help mitigate the problem, ARM built in a branch-target buffer (BTB) to trap and hold the first few instructions from the beginning (target) of the most recently encountered branches. What makes the A15 interesting is its new “micro BTB,” which is managed like a fully associative cache. Through the magic of dynamic branch prediction, the A15 basically guesses whether a branch will be taken or not, then looks up the target of that branch in its micro BTB. Assuming the guess is correct, a nasty pipeline bubble is all but prevented.

I Don’t Want To Be Alone

There’s plenty more going on inside the A15, but, rather than wallow in the nerdy details, let’s turn our attention to its clustering abilities. Like the Cortex-A9, the A15 can be fabricated in clusters of four CPUs. (You can also make one- and two-CPU clusters.) All four CPUs share the same L2 cache and thus maintain cache coherence.

Unlike with the A9, you can combine more than one of these four-way clusters in a single chip. For now, ARM admits the A15 will support two such clusters, for an eight-way processor. Realistically, the limit is probably around 32 cores or so. All the CPUs remain cache coherent with one another through a shared AMBA 4 interface. All the caches are fault-tolerant, too, courtesy of ECC (error checking and correction). The L1 and L2 caches will silently correct single-bit errors or squeal and complain if they detect two-bit errors. ECC is important if you’re making servers that are up and running 24/7 and might occasionally (in fact, will probably) encounter the sporadic “soft” error in RAM.

There’s still more to the Cortex-A15, too. There’s the register renaming that enables aggressive out-of-order execution. There’s the new privilege level that helps with virtualization. There’s the 40-bit physical addressing that allows software to access 1 TB of memory. The list goes on and on.

In short, the A15 is a big-boy, grownup processor. It’s got the whole checklist of high-performance features that a MIPS 1074K, PowerPC e600, or Intel Xeon has. It’s a he-man processor buffed and ready for some heavy lifting.

What it’s not is your traditional small and light ARM processor. ARM isn’t revealing the A15’s power numbers or die area yet, but I’m willing to bet it’s big and it’s hot. You can’t run a processor this complex without burning a lot of watts. ARM can’t sprinkle any magic pixie dust on its CPUs; they’re governed by the same laws of physics as everyone else’s. The company earned its low-power reputation by designing CPUs that were less complex than anyone else’s. They weren’t magic; they were simple. With the A15, the company joins the ranks of the other high-end processor vendors, watts and all.

It’s as though ARM has reached puberty. The company has grown up and earned its place at the adult table with the other grownups. But in so doing, it’s lost some of its youthful charm. The company that wrote the book on low-power licensed processor designs has steadily outgrown the characteristics that made it so appealing. It’s filled out and become an awkward teen, not sure whether it should be playing with toys or applying for its first job. Welcome to the complex world of adulthood.  

Leave a Reply

featured blogs
Nov 30, 2023
Cadence Spectre AMS Designer is a high-performance mixed-signal simulation system. The ability to use multiple engines and drive from a variety of platforms enables you to "rev up" your mixed-signal design verification and take the checkered flag in the race to the ...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

Switch to Simple with Klippon Relay
In this episode of Chalk Talk, Amelia Dalton and Lars Hohmeier from Weidmüller explore the what, where, and how of Weidmüller's extensive portfolio of Klippon relays. They investigate the pros and cons of mechanical relays, the benefits that the Klippon universal range of relays brings to the table, and how Weidmüller's digital selection guide can help you choose the best relay solution for your next design.
Sep 26, 2023
7,586 views