feature article
Subscribe Now

ARM’s Cortex-A15 “Eagle” Has Landed

You know who you are. You’re one of the legions of ARM programmers, engineers, and developers. You made ARM the most popular 32-bit processor on the planet—eclipsing even Intel. You use an ARM-based cell phone, you listen to your ARM-based iPod, you spin up ARM-based disk drives… admit it. You’re part of the ARM army.

Well, good news, campers. The latest, greatest, fastest, most wonderful-est ARM processor in the world just got announced today. It’s the tippy-top of ARM’s broad family tree, surpassing even the multicore Cortex-A9. Behold the Cortex-A15. Look upon it and be amazed.

Okay, maybe the A15 isn’t that big a deal. Yes, it’s a sophisticated and advanced 32-bit processor design, and it’s clearly the best work that ARM has ever done. But to be honest… it’s a lot like other 32-bit designs from other CPU companies. The big deal is that it’s the most-advanced CPU from ARM. It’s just not the most-advanced CPU ever.

What’s She Got Under the Hood?

By any measure, the Cortex-A15 (which was code-named Eagle in development) is an impressive piece of work. It’s a multicore, superscalar, out-of-order, 32-bit machine with extensive branch prediction, virtualization, register renaming, parallel execution units, and all the other bells and whistles you could want. When the first A15-based chips hit the street next year, they should hum along at about 1.5GHz and may hit 2.5GHz with a bit of a tailwind. The A15 can easily support up to eight CPU cores, and ARM hints that support for 32 cores and more might be just around the corner. Clearly, this is a big processor for big tasks.

This is not your father’s low-power ARM processor. Cortex-A15 isn’t for cell phones or iPads. It’s intended to take on big server chips, network chips, and communications processors where Freescale, Intel, Cavium, NetLogic, Marvell, and other comms-related companies play today. Forget what you remember about the cute little ARM7. Cortex-A15 moves ARM into the world of big iron.

And that’s really a double-edged sword. The A15 looks like it has the performance wherewithal to duke it out with the big boys from MIPS or PowerPC or Intel. But the A15 also gives up (or at least, downplays) the traditional ARM advantages of low power, small die area, and simple programming. To make a big and powerful processor, ARM had to… make a big and powerful processor. Let me show you what I mean.

For starters, A15 has a massively long 24-stage pipeline. Fully half of that—12 pipeline stages—is just for fetching and decoding instructions. That’s a heck of a long time before instructions even start to execute. The execution units (there are eight of them) take 3–12 additional pipeline stages, for a total of up to 24. There are two “simple” execution units for basic ARM instructions, two for multimedia and floating-point instructions, two for loads and stores, one for multiplication and division, and one execution unit for handling branches. All of these can run in parallel, though it would be an unusual piece of code that kept them all busy at once.

The long pipeline is necessary to enable the high clock rates; high frequencies mean short periods, after all. And since cache memories aren’t getting much faster these days, it takes more cycles to fetch code and data from those sluggish SRAMs. The downside of a long pipeline is the penalty you pay every time you have to flush it and start over. In other words, branches derail this high-speed train.

Every branch that’s taken forces the A15 to flush and reload its long instruction pipeline. Nothing unusual about that; every processor in the world does the same thing. To help mitigate the problem, ARM built in a branch-target buffer (BTB) to trap and hold the first few instructions from the beginning (target) of the most recently encountered branches. What makes the A15 interesting is its new “micro BTB,” which is managed like a fully associative cache. Through the magic of dynamic branch prediction, the A15 basically guesses whether a branch will be taken or not, then looks up the target of that branch in its micro BTB. Assuming the guess is correct, a nasty pipeline bubble is all but prevented.

I Don’t Want To Be Alone

There’s plenty more going on inside the A15, but, rather than wallow in the nerdy details, let’s turn our attention to its clustering abilities. Like the Cortex-A9, the A15 can be fabricated in clusters of four CPUs. (You can also make one- and two-CPU clusters.) All four CPUs share the same L2 cache and thus maintain cache coherence.

Unlike with the A9, you can combine more than one of these four-way clusters in a single chip. For now, ARM admits the A15 will support two such clusters, for an eight-way processor. Realistically, the limit is probably around 32 cores or so. All the CPUs remain cache coherent with one another through a shared AMBA 4 interface. All the caches are fault-tolerant, too, courtesy of ECC (error checking and correction). The L1 and L2 caches will silently correct single-bit errors or squeal and complain if they detect two-bit errors. ECC is important if you’re making servers that are up and running 24/7 and might occasionally (in fact, will probably) encounter the sporadic “soft” error in RAM.

There’s still more to the Cortex-A15, too. There’s the register renaming that enables aggressive out-of-order execution. There’s the new privilege level that helps with virtualization. There’s the 40-bit physical addressing that allows software to access 1 TB of memory. The list goes on and on.

In short, the A15 is a big-boy, grownup processor. It’s got the whole checklist of high-performance features that a MIPS 1074K, PowerPC e600, or Intel Xeon has. It’s a he-man processor buffed and ready for some heavy lifting.

What it’s not is your traditional small and light ARM processor. ARM isn’t revealing the A15’s power numbers or die area yet, but I’m willing to bet it’s big and it’s hot. You can’t run a processor this complex without burning a lot of watts. ARM can’t sprinkle any magic pixie dust on its CPUs; they’re governed by the same laws of physics as everyone else’s. The company earned its low-power reputation by designing CPUs that were less complex than anyone else’s. They weren’t magic; they were simple. With the A15, the company joins the ranks of the other high-end processor vendors, watts and all.

It’s as though ARM has reached puberty. The company has grown up and earned its place at the adult table with the other grownups. But in so doing, it’s lost some of its youthful charm. The company that wrote the book on low-power licensed processor designs has steadily outgrown the characteristics that made it so appealing. It’s filled out and become an awkward teen, not sure whether it should be playing with toys or applying for its first job. Welcome to the complex world of adulthood.  

Leave a Reply

featured blogs
Apr 25, 2024
Cadence's seven -year partnership with'¯ Team4Tech '¯has given our employees unique opportunities to harness the power of technology and engage in a three -month philanthropic project to improve the livelihood of communities in need. In Fall 2023, this partnership allowed C...
Apr 24, 2024
Learn about maskless electron beam lithography and see how Multibeam's industry-first e-beam semiconductor lithography system leverages Synopsys software.The post Synopsys and Multibeam Accelerate Innovation with First Production-Ready E-Beam Lithography System appeared fir...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

How Capacitive Absolute Encoders Enable Precise Motion Control
Encoders are a great way to provide motion feedback and capture vital rotary motion information. In this episode of Chalk Talk, Amelia Dalton and Jeff Smoot from CUI Devices investigate the benefits and drawbacks of different encoder solutions. They also explore the unique system advantages of absolute encoders and how you can get started using a CUI Devices absolute encoder in your next design.
Apr 1, 2024
3,297 views