Unpicking the Codes

We did think of “ARM for Dummies” as the title of this guide to the arcane, but ultimately logical, world of ARM architectures and processors, but since none of you are dummies and we were uncertain about the copyright position of “… for Dummies,” we decided not to. Instead here is a simplified map through architectures, implementations, and jargon.

ARM-based chips ship in the millions each week: the ARM website claims that 90 ARM processors are shipped every second! Over 200 semiconductor companies are shipping products with ARM processors in them. (Makes Intel Inside seem just a little underwhelming?) About 98 percent of all mobile phones use at least one ARM core on their motherboards. The list goes on and on. The products shipping are, in the main, from eight product families: ARM7, ARM9, ARM9E, ARM10E, ARM11, Cortex, and SecurCore.

ARM, in its documentation, distinguishes between architecture, processor, and device. The architecture is the underlying definition of processor behaviour. The processor is an implementation of that architecture and is the IP that ARM licenses. The device is the implementation in silicon of the processor, with other hardware elements, by a semiconductor company, and there are many thousands of these. There is no simple mapping from an ARM architecture to a series of ARM processors.

The architecture

The original architecture, ARMv1 and ARMv2, (Acorn RISC Machine and later Advanced RISC Machine) dates from the early 1980s, and is one of the first (if not the first) commercial implementation of a 32-bit RISC (Reduced Instruction Set Computer) approach developed at UC Berkeley. It was designed initially for a personal computer from Acorn, but as it evolved, ARM began to license the design to other users, who began to use it not just in desktop computers but also in embedded applications.

Successive generations of architecture added extra functions, such as a Memory Management Unit (MMU), integrated graphics and IO processing, processor cache, and so on. Different architectures have added pipelines, first three-stage and then five-stage.

The ARMv3 was the original architecture for ARM7 processors, but it was replaced by ARMv4 and quickly extended to the ARMv4T.

In ARM nomenclature, T usually stands for Thumb, a 16-bit instruction set extension to the 32-bit architecture. This was introduced to allow the 32-bit architecture to take advantage of cheaper 8- and 16-bit memory and memory buses. It gives better code density and smaller die size. Each 16-bit instruction expands into a 32-bit instruction on execution.

ARMv4 and ARMv4T were also the architectures for the original ARM9 processor family. This family was extended when the ARMv5 architecture was introduced in 1999. ARMv5TE had the Thumb instruction set, but it also included enhanced DSP support (hence the suffix E. See, it is all building up). A year later came the ARMv5TEJ, with Jazelle technology to support Java bytecode in hardware. This provides a faster implementation of Java than using a Java virtual machine and still allows an OS and other applications to run on a core. This architecture also saw the introduction of a Vector Floating Point (VFP) unit as a co-processor option.

ARMv6 was introduced for the ARM11 processor family. It was designed, amongst other detailed upgrades, to work better in multiprocessing environments. It also added media instructions, for audio and video processing, including instructions for SIMD (Single Instruction Multiple Data), where running one instruction carries out a simultaneous operation on a range of data. ARMv6 also saw the introduction of Thumb-2, a superset of Thumb’s 16-bit instructions that operates in mixed (16-bit and 32-bit) mode to improve performance. TrustZone technology, a way of supporting security applications such as secured PIN entry, secured Near Field Communication (NFC) channels, Digital Rights Management (DRM), and so on, was introduced with ARMv6.

The latest architecture is ARMv7, the architecture for much of the Cortex family. It has built-in Thumb-2 instructions, a new version of the VFP unit, NEON, an advanced version of the multi-media instructions, and dynamic compiler support.

Other bits and pieces

In addition to the architectures, ARM has developed other elements that a silicon designer can use in building a device. The first group includes debug and trace options. The Embedded Trace Macrocell (ETM) has been a feature of the ARM processors since ARM7. If this is implemented in the design, then it is possible to capture information about the state of the processor both before and after a specific event, providing detailed debug information. It can be complemented by an Embedded Trace Buffer, dedicated memory to store the trace information. Once implemented, the ETM can be configured through software.

In later architectures, an enhanced ETM, CoreSight, is available, which can be used to provide debug information for the whole of an SoC using single or multiple ARM processors.

An issue is that these features have to be implemented by the device designer. And, in an effort to keep device pricing to a minimum, they may not provide ETM, may not implement the full features, or may limit the memory for buffering. All these actions can, indeed, keep cost down, but they make the end products more difficult to debug.

Another important feature that ARM has developed is the AMBA protocol. This was, for a while, called the Advanced Microcontroller Bus Architecture, but ARM uses just the acronym, as its use has spread across the SoC. Now at AMBA3, it exists in a variety of flavours, including System Bus, Peripheral Bus, High Performance Bus, and an Extensible Interface.

Processors

All through electronics, we seem to get more and more complicated, and with ARM, starting from a single processor, we now have a complicated matrix of processors, which can be sliced in many different ways.

Let’s look at the available families first:

They start with ARM7, with three members (although, to confuse matters, some of the latest Cortex designs are based on ARM7 processors). They are now positioned as low-power and low-cost for things such as MP3 players, entry-level wireless hand sets, and pagers. It is unlikely there will be new product introductions using the ARM7, as implementers are more likely to use the Cortex-M3 and Cortex-M0.

ARM9 has two sub-families, ARM9 and ARM9E. The last ARM9 left is the hard macro cell ARM922T. There are still three ARM9Es in the catalogue, all synthesisable and all with DSP functionality. The ARM926JE-S has Jazelle technology for Java, the ARM968E-S is the lowest power, smallest member of the family, designed for deeply-embedded real-time applications, and the ARM946E-S fits between the two. Both the ARM926E and the ARM946E are also available as hard cores.

There is a single ARM10 still listed. This is the ARM1026EJ-S with Jazelle again and aimed at high performance SoCs.

Until the introduction of the Cortex family, the ARM11 was the flagship. And today it still has a wide range, divided into four groups.

ARM1176JZ-S and ARM1176JZF-S are applications processors for consumer and wireless; the ARM1156T2-S and ARM1156T2F-S are high performance processors for automotive, data storage, imaging, and embedded control; and the ARM1136J-S and ARM1136JF-S processors are for network infrastructure for consumer applications and for automotive infotainment. As before, J versions feature Jazelle for Java, T versions have the Thumb-2 instruction sets and F versions have the VFP unit.

The fourth part of the ARM11 family is the ARM11 MPCore Multiprocessor. This is a synthesizable multiprocessor, based on the ARM11 micro-architecture, and it can be configured to contain between one and four processors delivering up to 2600 Dhrystone MIPS of performance.

OK – it’s all falling into place; we are beginning to get an understanding of the logic behind the naming — and then ARM decides to shake up the system with the Cortex family.

As we said earlier, the new ARMv7 architecture is the basis on which most of the Cortex family is implemented, but even here there is a joker in the pack. There are three series, designed for specific broad markets. The ARM Cortex-A Series is for complex OS and user applications, with the Cortex-A8 a single processor and the Cortex-A9 as either a single core or for up to four processors in a multi-processor configuration. They are both, as a car-dealer would say, fully loaded with a wide range of pipelines, memory options, L1 and L2 cache: everything, in short, for high performance computing.

The ARM Cortex-R Series is for real-time systems with, so far, the Cortex-R4 and the Cortex-R4F (floating-point) processors. Again, ARM is stressing performance and security for these processors.

Finally, there is the ARM Cortex-M Series billed as deeply embedded processors optimized for cost-sensitive applications. They run only the Thumb-2 instruction set and are based on the ARMv6 architecture. (That’s the joker – the other Cortex series run ARMv7.) This series includes the Cortex-M3, the Cortex-M1, and the recently announced Cortex-M0. The Cortex-M3 is aimed at microcontroller applications, the Cortex-M1 is designed specifically for FPGA use, and the Cortex-M0 is very small and low power and might be regarded as an 8-bit killer. Certainly ARM is driving it hard into 16-bit applications.

It is not unreasonable to expect even more cores within the Cortex approach.

ARM also breaks its analysis of processors into Application Processors, those normally expected to run a general-purpose operating system, and Embedded Processors, those more normally used for real-time and likely to run an RTOS.

And then we have some extra cores. SecurCore uses a range of techniques, including things like randomising the layout, special development flows, special debug technologies, and so on. The SC100 is aimed at smart card applications with only 35K gates and is based on the ARM7TDMI. The SC200 is based on ARM9 and includes Jazelle for Java applications, and the SC300 is based on Cortex-M3.

Finally there is the MALI-VE multi-standard video engine: a co-processor that was completed with the purchase of Sweden-based Logipard AB early in 2009.

What does it all mean?

ARM doesn’t make devices. And here is where life gets even more complicated. The decisions the device manufacturer makes when synthesizing the ARM IP can have profound effects on how suitable the device will be for your application. Many manufacturers have long lists of devices, aimed in some cases at very explicit markets, such as motor control, or automotive infotainment, often supported with reference designs and development kits. This is fine, and they can be very aggressive in pricing these competitively. But kick the tyres a bit. What debugging elements have they incorporated in the hardware? How many pins are dedicated to Trace? How many hardware breakpoints are built-in? Is it worth saving a few cents per device if a bug is going to be difficult to eliminate and could cause you to miss the sweet spot in the market? Of course, in an ideal world, the product will not need debugging. But we live and work in a real world.

And, given the huge numbers of devices, there is a huge ecosystem surrounding the cores: over 500 companies are ARM partners. Development tools from a wide range of third parties address specific manufacturers’ implementations, and familiarity with a specific tool chain may be another factor in your buying choice.

In the end, perhaps all this decoding is irrelevant. What you buy for your next project is not going to be decided by what ARM core is inside, but how well a specific device matches your needs and your existing development infrastructure.