feature article
Subscribe Now

ARM’s Cortex-A15 “Eagle” Has Landed

You know who you are. You’re one of the legions of ARM programmers, engineers, and developers. You made ARM the most popular 32-bit processor on the planet—eclipsing even Intel. You use an ARM-based cell phone, you listen to your ARM-based iPod, you spin up ARM-based disk drives… admit it. You’re part of the ARM army.

Well, good news, campers. The latest, greatest, fastest, most wonderful-est ARM processor in the world just got announced today. It’s the tippy-top of ARM’s broad family tree, surpassing even the multicore Cortex-A9. Behold the Cortex-A15. Look upon it and be amazed.

Okay, maybe the A15 isn’t that big a deal. Yes, it’s a sophisticated and advanced 32-bit processor design, and it’s clearly the best work that ARM has ever done. But to be honest… it’s a lot like other 32-bit designs from other CPU companies. The big deal is that it’s the most-advanced CPU from ARM. It’s just not the most-advanced CPU ever.

What’s She Got Under the Hood?

By any measure, the Cortex-A15 (which was code-named Eagle in development) is an impressive piece of work. It’s a multicore, superscalar, out-of-order, 32-bit machine with extensive branch prediction, virtualization, register renaming, parallel execution units, and all the other bells and whistles you could want. When the first A15-based chips hit the street next year, they should hum along at about 1.5GHz and may hit 2.5GHz with a bit of a tailwind. The A15 can easily support up to eight CPU cores, and ARM hints that support for 32 cores and more might be just around the corner. Clearly, this is a big processor for big tasks.

This is not your father’s low-power ARM processor. Cortex-A15 isn’t for cell phones or iPads. It’s intended to take on big server chips, network chips, and communications processors where Freescale, Intel, Cavium, NetLogic, Marvell, and other comms-related companies play today. Forget what you remember about the cute little ARM7. Cortex-A15 moves ARM into the world of big iron.

And that’s really a double-edged sword. The A15 looks like it has the performance wherewithal to duke it out with the big boys from MIPS or PowerPC or Intel. But the A15 also gives up (or at least, downplays) the traditional ARM advantages of low power, small die area, and simple programming. To make a big and powerful processor, ARM had to… make a big and powerful processor. Let me show you what I mean.

For starters, A15 has a massively long 24-stage pipeline. Fully half of that—12 pipeline stages—is just for fetching and decoding instructions. That’s a heck of a long time before instructions even start to execute. The execution units (there are eight of them) take 3–12 additional pipeline stages, for a total of up to 24. There are two “simple” execution units for basic ARM instructions, two for multimedia and floating-point instructions, two for loads and stores, one for multiplication and division, and one execution unit for handling branches. All of these can run in parallel, though it would be an unusual piece of code that kept them all busy at once.

The long pipeline is necessary to enable the high clock rates; high frequencies mean short periods, after all. And since cache memories aren’t getting much faster these days, it takes more cycles to fetch code and data from those sluggish SRAMs. The downside of a long pipeline is the penalty you pay every time you have to flush it and start over. In other words, branches derail this high-speed train.

Every branch that’s taken forces the A15 to flush and reload its long instruction pipeline. Nothing unusual about that; every processor in the world does the same thing. To help mitigate the problem, ARM built in a branch-target buffer (BTB) to trap and hold the first few instructions from the beginning (target) of the most recently encountered branches. What makes the A15 interesting is its new “micro BTB,” which is managed like a fully associative cache. Through the magic of dynamic branch prediction, the A15 basically guesses whether a branch will be taken or not, then looks up the target of that branch in its micro BTB. Assuming the guess is correct, a nasty pipeline bubble is all but prevented.

I Don’t Want To Be Alone

There’s plenty more going on inside the A15, but, rather than wallow in the nerdy details, let’s turn our attention to its clustering abilities. Like the Cortex-A9, the A15 can be fabricated in clusters of four CPUs. (You can also make one- and two-CPU clusters.) All four CPUs share the same L2 cache and thus maintain cache coherence.

Unlike with the A9, you can combine more than one of these four-way clusters in a single chip. For now, ARM admits the A15 will support two such clusters, for an eight-way processor. Realistically, the limit is probably around 32 cores or so. All the CPUs remain cache coherent with one another through a shared AMBA 4 interface. All the caches are fault-tolerant, too, courtesy of ECC (error checking and correction). The L1 and L2 caches will silently correct single-bit errors or squeal and complain if they detect two-bit errors. ECC is important if you’re making servers that are up and running 24/7 and might occasionally (in fact, will probably) encounter the sporadic “soft” error in RAM.

There’s still more to the Cortex-A15, too. There’s the register renaming that enables aggressive out-of-order execution. There’s the new privilege level that helps with virtualization. There’s the 40-bit physical addressing that allows software to access 1 TB of memory. The list goes on and on.

In short, the A15 is a big-boy, grownup processor. It’s got the whole checklist of high-performance features that a MIPS 1074K, PowerPC e600, or Intel Xeon has. It’s a he-man processor buffed and ready for some heavy lifting.

What it’s not is your traditional small and light ARM processor. ARM isn’t revealing the A15’s power numbers or die area yet, but I’m willing to bet it’s big and it’s hot. You can’t run a processor this complex without burning a lot of watts. ARM can’t sprinkle any magic pixie dust on its CPUs; they’re governed by the same laws of physics as everyone else’s. The company earned its low-power reputation by designing CPUs that were less complex than anyone else’s. They weren’t magic; they were simple. With the A15, the company joins the ranks of the other high-end processor vendors, watts and all.

It’s as though ARM has reached puberty. The company has grown up and earned its place at the adult table with the other grownups. But in so doing, it’s lost some of its youthful charm. The company that wrote the book on low-power licensed processor designs has steadily outgrown the characteristics that made it so appealing. It’s filled out and become an awkward teen, not sure whether it should be playing with toys or applying for its first job. Welcome to the complex world of adulthood.  

Leave a Reply

featured blogs
Jun 6, 2023
Learn about our PVT Monitor IP, a key component of our SLM chip monitoring solutions, which successfully taped out on TSMC's N5 and N3E processes. The post Synopsys Tapes Out SLM PVT Monitor IP on TSMC N5 and N3E Processes appeared first on New Horizons for Chip Design....
Jun 6, 2023
At this year's DesignCon, Meta held a session on '˜PowerTree-Based PDN Analysis, Correlation, and Signoff for MR/AR Systems.' Presented by Kundan Chand and Grace Yu from Meta, they talked about power integrity (PI) analysis using Sigrity Aurora and Power Integrity tools such...
Jun 2, 2023
I just heard something that really gave me pause for thought -- the fact that everyone experiences two forms of death (given a choice, I'd rather not experience even one)....

featured video

Synopsys 224G & 112G Ethernet PHY IP Demos at OFC 2023

Sponsored by Synopsys

Watch this video of the Synopsys 224G & 112G Ethernet PHY IP demonstrating excellent performance and IP successful ecosystem interoperability demonstrations at OIF.

Learn More

featured paper

EC Solver Tech Brief

Sponsored by Cadence Design Systems

The Cadence® Celsius™ EC Solver supports electronics system designers in managing the most challenging thermal/electronic cooling problems quickly and accurately. By utilizing a powerful computational engine and meshing technology, designers can model and analyze the fluid flow and heat transfer of even the most complex electronic system and ensure the electronic cooling system is reliable.

Click to read more

featured chalk talk

Designing with GaN? Ask the Right Questions about Reliability
As demands for high-performance and low-cost power conversion increases, gallium nitride offers several intriguing benefits for next generation power supply design. In this episode of Chalk Talk, Amelia Dalton and Sandeep Bahl from Texas Instruments investigate the what, why and how of gallium nitride power technology. They take a closer look at the component level and in-system reliability for TI’s gallium nitride power solutions and why GaN might just be the perfect solution for your next power supply design.
Oct 4, 2022
29,494 views