ARMed and Dangerous

Are you in or are you out?

If you’re out, this is for you. If you’re in, it’s a review.

It’s an ARM core decoder of sorts.

You see, whenever a company like ARM or Intel generates a universe of its own, two things happen. One is that it carries a long legacy, courtesy of its long history. And, as things change, or as the roadmap undergoes strategic alterations, what might have been simple starts to become complex. The burden of acknowledging the past weighs on decisions for the future. If you weren’t an insider, if you weren’t watching all the moves and trying to understand them, you could end up lost and confused.

The other thing that happens is that “insiders” (most of whom don’t work for the company) start to wiggle their way even further inside, learning, if possible, the new codes for new products and carefully keeping a mental inventory of which chip is being built on which architecture for which target market. By using code names and shorthand, insiders solidify their insider status by making things more opaque to outsiders. At the same time, they carry themselves with a breezy, “Oh, of course, everyone clearly knows that…” bearing while discussing things that few people really know.

And so outsiders may become more cautious about outing themselves as outsiders, since no one really wants to admit to the things they don’t know that everyone else clearly knows.

Well today we hoist a drink to the outsiders. And the best way to describe something opaque to outsiders is for one outsider to attempt to figure it out and then communicate it. Because we outsiders – yes, I include myself as an outsider – don’t assume that certain things are obvious, mostly because they’re not. So we risk asking the stupid questions that everyone really wants to ask. Or we try to. (And occasionally the questions are indeed stupid, and I have to blame my fuzzy brain rather than fuzzy concepts… but mostly not.)

So this is my attempt to make sense out of the ARM processor line. If you’ve ever thought that the ARM offering can be confusing, you’re right: it is. There are codes and numbers that aren’t well understood – even within the company, and things that you would think would be natural boundaries – like fundamental architecture changes and major product family changes – turn out not to coincide. I’ve even attempted a graphic representation to account for some thousand words I don’t want to have to type. But even so, as you can tell by looking at the axes, it only works “sort of.”

Let’s review the most obvious bit first: ARM makes microprocessor IP cores licensed for use in systems-on-chip (SoCs) or as stand-alone processor chips. Most bear the ARM name. Others, like the XScale, were licensed and rebranded by other companies. We won’t focus on the latter; the ARM stuff contains enough twists and turns for today.

First of all, there was a major strategic product rethink somewhere along the line. This resulted in what is probably the clearest demarcation in the product line, which I refer to herein as The Great Divide, although it’s not really as helpful as you might hope. At some point, the company changed from “simply” numbering their families sequentially to calling everything Cortex, but dividing those into three subgroups. More on that in a minute.

Going monotonic… sort of

The older devices are simply referred to as “Classic Processors,” and there are a few of them still available for licensing. Most of them have been retired. And some never seem to have existed.

ARM started as Acorn Computers, and their first success was with an education-focused machine called the BBC Micro. For their next trick, they wanted to address business markets, but they didn’t like what they saw in the processor world at the time. This was the early 80s, and the RISC concept was hitting the scene at Berkeley. After playing with the concepts, the Acorn RISC Machine (ARM) was born, starting with the ARM1.

This started the original naming trend, which continued from ARM2 through ARM11. Except that there are a few members MIA. I’ve seen no evidence of an ARM4 or ARM5; I also thought ARM8 to be in that category until I saw a section on it in a textbook. So efforts to expunge it from history were only partly successful. Presumably these missing links are projects that were started and never went anywhere, giving way instead to a subsequent generation.

But all is not as simple as a monotonically increasing generation number. Behind the ARM cores was an architecture or specification, and this architecture was revised over the years. The version is indicated by a “v” number, like “ARM v1” (or “ARMv1”). Exactly when it ticked over to a new number was something of a judgment call: for example, according to ARM, v6 ushered in a formalized memory structure, security extensions, and single-instruction-multiple-data instructions (SIMD; basically, vector instructions); v7 brought about the concept of “profiles” (more on that in a minute) and the advanced SIMD instructions they call NEON technology.

But here’s the catch: the architecture is independent of the actual core design or micro-architecture. So, in fact, when transitioning families – say, from ARM6 to ARM7, they would keep the architecture constant and change just the micro-architecture, lengthening the pipeline, for example. Then, with the next member of that family – say, the ARM7TDMI (we’ll get to the extra letters in a minute), they’d change the architecture. So the architecture version numbers do not align with core numbers, and they don’t even change when the core numbers change.

Each core in the chart is labeled and color-coded with its associated architecture version. Note that all cores after The Great Divide use v7. Actually, that’s not quite true. That would be way too clean. More on that in a minute.

So, now that we’ve muddied things up with the architecture, let’s go back to the cores. Each core has a different set of features, and it’s pretty hard to define in general how things evolved. You can look at each core and see what’s different from other cores (at least where the information still exists), but there seems little regularity.

One element that does change relatively faithfully between generations is the length of the execution pipeline. ARM7 and earlier used a three-stage pipeline; thereafter, the ARM9 used 5, the ARM10 used 6, and the ARM11 uses 8 stages. Today. the Cortex A8 (which we haven’t talked about yet) uses 13 stages.

It’s not enough that the features change between families; various specific features are also called out on different versions. The selection of what to call out and what not to call out and how the letters work seem kind of random.

One of the variants has to do with the instruction set. You might not expect this to change much between products (even if you would expect it to grow over time, the concept of “reduced” in RISC being relative). But today there is not one, nor two, but three instruction sets. There’s the main ARM instruction set, which consists of 32-bit instructions. At some point, with the ARM7TDMI, a subset 16-bit instruction set called Thumb was inaugurated (the “T” in “TDMI”).

Then, with the ARM1156, they further split the difference between the full ARM instruction set and Thumb and added the Thumb 2 instruction set. It adds some 32-bit instructions to the otherwise 16-bit Thumb instruction set.

As to the letters, you’ve seen that “T” indicates “Thumb”; here is the complete list, as far as I am aware, with thanks to Charlene Marini at ARM for tracking some of these down (as well as helping to fill in other blanks). This is also on the chart.

There are even numbers accompanying some of the core numbers – ARM926, for example – what does the 26 mean? As of finalizing this article, ARM hadn’t found the answer to this yet…

For the record, the ARM7TDMI seems to be the first core that really caught on big, putting ARM solidly on the map. It’s still available today, although it’s not recommended for new designs.

Triplets are born

So that takes us up to The Great Divide. At this point, a fundamental shift happened in the company strategy. Rather than doing various combinations of features for different kinds of applications, they created three “profiles” under the v7 architecture. The simplest is referred to as the v7-M profile, for “microcontroller.” The middle family is referred to as the v7-R profile, for “real-time.” And the largest use the v7-A profile, for “application.” Take all three together, flip it and reverse it, and you get ARM. Ta-dahhhh!

These family members are all referred to as Cortex, with Cortex-M, Cortex-R, and Cortex-A sub-families. And here architecture does align with family and sub-family. (Well, almost. But we’ll get to that.)

The microcontroller cores are intended to be simple and minimal. In general, they’re smaller than some of the older cores.

The real-time cores were inspired by the needs of disk drives in particular, but, in reality, they are needed by any systems where the system can fail if timing isn’t met. They have a completely separate architecture with a shorter pipeline and trace capabilities for those systems where you don’t have the luxury of stopping them while you debug a problem.

The application profile is characterized fundamentally by the ability to support an OS; that is, it has a memory management unit (MMU). These are full-on superscalar cores meant for heavy lifting.

Each of the respective family members is numbered, with higher numbers meaning higher capabilities. There is no relationship between numbers in different families. A Cortex-M4 has nothing to do with a Cortex-R4; the fact that they share a “4” means nothing. There are gaps in the numbering scheme simply to allow room for possible future core versions that may nestle in between , for instance, the Cortex-A9 and the Cortex-A15.

Fewer letter-additions are needed here because the various features that used to be explicitly called out in the core name are now implied by the architecture used. This is particularly true with respect to the instruction set. All of them support the Thumb and Thumb 2 set; the –R and –A profiles also support the full 32-bit ARM instruction set. The –A family further supports cache maintenance and NEON instructions.

The Cortex-M1 is unique in that it was created specifically for implementation on FPGAs, one of the earlier instances of a synthesizable ARM core (something many used to think would never happen, since too many secrets can be viewed in the RTL). Actel (now Microsemi) was the first to use this; ARM also shows performance numbers for implementation on various Altera and Xilinx families.

Playing it safe

Less visible is a set of cores intended for use in applications where security is critical. Called SecurCore, this family really seems to be something of a hodge-podge, drawing from old and new families and blurring otherwise clear lines. The smallest device, for example, the SC000, is based on the Cortex-M0, except that, unlike any other Cortex, it is based on an M profile of the v6 architecture, not the v7 architecture. Gah! This is the one example I’ve referred to repeatedly before as the annoying exception.

The middle device is derived from ARM7, and the largest device is based on the Cortex-M3 (happily, also using the v7-M profile).

There are a million other features and capabilities that I haven’t even touched on. And, while they may all be worthy of discussion, they do nothing to help with the big-picture understanding of what’s going on (and actually make things seem even muddier). So any of you insiders that think you may have lost your special status, if you really really understand all the nuances, you’re still miles ahead of us now-slightly-less-outsiders.

So we best leave it here, hoping that my attempt has been somewhat fruitful, between the picture and the prose. As for me, there’s no way I could redraw the graphic by heart. But I at least can recite the basics, and I know that the reason the rest of it seems fuzzy is because it is fuzzy, not solely because my brain is fuzzy.

ARMed and Dangerous

Related

8 thoughts on “ARMed and Dangerous”

Leave a Reply Cancel reply

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

featured chalk talk