feature article
Subscribe Now

ARM’s Race Escalates with Cortex-A9

In military parlance, an Osprey is a propeller-driven airplane that takes off and lands vertically, like a helicopter. The Osprey tilts its wings 90 degrees, the props pull it straight up, and the wings flip back again for conventional flight. Clever engineering, but a bit ungainly to look at.

Over in the less dangerous but equally contentious microprocessor world, ARM has also hatched its own Osprey, this one officially named the ARM Cortex-A9. The new A9 will be capable of 2-GHz clock rates, an unheard-of speed for an ARM core. The previous-generation A8 was barely able to make 1 GHz (see Embedded Technology Journal, July 28, 2009, “Better, Stronger, Faster), and even that required some silicon sleight of hand from Intrinsity. At 2 GHz, the new A9 becomes the most potent weapon from the ARM’s dealer.

How did ARM achieve that speed? You might say it was by fixing the A8, but that would be ungracious. A more charitable analysis shows that the A9 actually has a shorter pipeline than the A8, usually the antithesis of high-speed efficiency. But the A8’s pipeline was deliberately long to tolerate implementation-specific tweaks. It also allowed for some margin in ARM’s first-ever superscalar core. Now that the company (and its licensees) has some experience fabricating superscalar ARM cores, it could fine-tune the logic paths, trim some margin, upgrade the core, and still produce a faster version that’s also more capable.

Improvements include a better floating-point unit and out-or-order execution. The A8 was in-order only, even though it was dual-issue superscalar. The A9 also comes in one-, two-, and four-core variations; the customer can choose the number of cores at implementation time. Although ARM doesn’t officially see it that way, the A8 now looks like a bit of a tentative first step into the world of superscalar Cortex implementations. The company expects that some customers will still license the A8 instead of the A9, probably opening a pricing gap between them.

A Soft Pitch and a Hard Core

Like most ARM cores, the A9 will be delivered either as “soft IP” or as a tuned hard macro. The company says it’s signed 15 licensees so far, even though the final version of the IP won’t be available until late this year. The list includes the usual suspects: Texas Instruments, Ericsson, nVidia, NXP, and Toshiba, as well as a gaggle of anonymous customers. Expect the A9 to appear first in mobile devices (a traditional ARM stronghold), then in automotive “infotainment” systems and living-room electronics such as HDTVs. This last product category is traditionally MIPS territory, so the A9 is a clear shot across the rival RISC vendor’s bow.

Like most of the recent high-end ARM processors, the A9 comes with an assortment of coprocessors baked right in. Jazelle acts as a Java accelerator; TrustZone aids security; Neon speeds up graphics. These accelerators are always part of the microprocessor core and can’t be removed to save either space or power. There’s little reason to: they account for just a small fraction of the silicon real estate (much smaller than the caches, for example), and consume next to no power when they’re not being used. Your software is free to ignore them, of course.

ARM Flexes Its Bicep

So how fast is the new A9? Um, pretty fast. Benchmarking a naked processor core, sans memory and I/O, is a speculative endeavor at best. ARM claims 10,000 Dhrystone MIPS, which is a bit like saying your jet fighter can do four million furlongs per fortnight. It might be accurate for some set of circumstances but it’s still largely meaningless.

The A9 should hit high triple digits with no problem in a modern 45nm process optimized for power efficiency. With all the knobs turned the other way, 2 GHz is doable, although power consumption will quadruple, from insignificant to merely tiny. 

For what it’s worth, an x86 processor like Intel’s Atom N270 delivers about 2000 Dhrystone MIPS at 1.6 GHz, so ARM is claiming something like 5x the performance at a 25% faster clock frequency. Again, that’s a specific implementation (in this case, a low-cost netbook computer) versus a disembodied processor core… but the difference is stark nonetheless. And it’s a given that the A9 will consume less power, no matter how it’s fabricated.

Using the more reliable EEMBC CoreMark test, the A8 has about a 2x advantage over the Atom N270. In other words, it delivers about the same performance at half the clock speed. And one-fifth the power. Not a bad tradeoff, as long as you don’t need x86 compatibility.

This last bit highlights one of the enduring frustrations for microprocessor designers. No matter how fast, efficient, or clever your CPU is, you can’t escape simple human inertia. Backward compatibility is a powerful force in the universe, and no amount of engineering can overcome it. Netbook computers were – briefly – seen as a second chance for good processors to overcome Intel’s dominance of the computing market. Instead, they’re just another new niche for the x86, albeit a less profitable one. It turns out that people want PCs that behave like PCs, not $350 Linux notebooks.

When Is an ARM Ahead?

So the Cortex-A9 brings ARM well and truly into the ranks of grown-up processor vendors. Superscalar? Check. Multicore? Check. Out-of-order execution, two-level caches, and coprocessors? Check, check, and check. Even PC-like speed grades are available if you have all the right ingredients in your fabrication line. On the hardware front, ARM has just about everything that the MIPS, PowerPC, and x86 vendors have.

On the software side, ARM’s story is getting better and better. It’s already the de facto architecture for wireless and/or mobile applications, thanks to a few early design wins with European cell phones. Just as the first IBM Personal Computer Model 5150 determined Intel’s fate, those early cell phones cast the die for ARM. Most everything that followed has been beer and skittles. Since then, ARM’s legion of licensees has shipped more 32-bit processors than anyone. Just last year, 4 billion new ARM-based chips went out the door  — that’s 11 million per day, or about 8 times the volume of MIPS-based processors.

Is it fair? Doesn’t matter. That’s how real product engineering works. You build the best product you can and you wait for customers to make their typically uninformed and irrational decisions. For the time being, ARM and x86 dominate much of the landscape, with other architectures flourishing in their own little niches. It’s a good time for designers choosing a processor and a good time for a few lucky processor vendors. With the Cortex-A9, ARM looks poised to continue its vertical takeoff.

Leave a Reply

featured blogs
Aug 16, 2018
Learn about the challenges and solutions for integrating and verification PCIe(r) Gen4 into an Arm-Based Server SoC. Listen to this relatively short webinar by Arm and Cadence, as they describe the collaboration and results, including methodology and technology for speeding i...
Aug 16, 2018
All of the little details were squared up when the check-plots came out for "final" review. Those same preliminary files were shared with the fab and assembly units and, of course, the vendors have c...
Aug 15, 2018
VITA 57.4 FMC+ Standard As an ANSI/VITA member, Samtec supports the release of the new ANSI/VITA 57.4-2018 FPGA Mezzanine Card Plus Standard. VITA 57.4, also referred to as FMC+, expands upon the I/O capabilities defined in ANSI/VITA 57.1 FMC by adding two new connectors that...
Aug 14, 2018
I worked at HP in Ft. Collins, Colorado back in the 1970s. It was a heady experience. We were designing and building early, pre-PC desktop computers and we owned the market back then. The division I worked for eventually migrated to 32-bit workstations, chased from the deskto...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...