feature article
Subscribe Now

Divide (and Conquer) by Zero

ARM’s Cortex-M0+ Proves Less Is More, More or Less

Just when you think ARM can’t sink any lower, they do this.

I mean that in a good way. ARM – effectively the world’s largest microprocessor company – has come out with yet another variation on its ubiquitous CPU architectural theme. This time it’s called the Cortex-M0+ because, well, M0 was already taken.

Yes, the company has officially run out of numbers.

The em-zero-plus is a lot like the em-zero, of course, but different. It’s got a plus sign in its name, for one. And it’s got some features that the M0 doesn’t have. Your first impulse might be to conclude that the M0+ is a replacement for the M0 – that was certainly my first impression – but it’s not. The official ARM party line is that the M0 and the M0+ will happily coexist side by side, and that one does not necessarily replace the other.

I’m not convinced. The more I look at the M0+, the more I think it’s really just the M0 done right. After all, it’s faster than the M0 but uses less energy and fits in the same silicon area. So on quick-and-dirty technical criteria the M0+ is better all around. The only disadvantage I can see is that the M0+ is more expensive to license than the M0, but that’s purely because ARM’s marketing department decreed it so. In short, the only advantage the M0 has is an artificial one.

Here’s the background: ARM has approximately a zillion different related CPU designs, all broken up into three big families: the A-series, the R-series, and the M-series. (Get it? A.R.M.! See what ARM’s marketing people did there?) The A-series has all the fast, high-end processors while the M-series is all the small, cheap, low-end stuff. Bryon Moyer described all this in his August 22, 2011 article.

Anyway, the lowest of the low-end M-series chips used to be the M3, but that was superseded (undercut?) by the even lower-end M1. (There is no M2; at least, not yet.) The M1 was ultimately undermined by the M0, which obviously marked the absolute bottom of the range. The M0 was the simplest, cheapest, smallest, and most power-efficient thing you could make and still call it an ARM processor. Anything simpler than that and you’re looking at an 8-bit MCU or a twisted rubber band. Little 32-bit processors just don’t come any simpler. 

Or so we thought.

Strategically, the entry-level M0 was designed to lure away 8-bit and 16-bit users and tempt them into joining the ARM camp. “Join us…” they beckoned. One by one, weary MCU programmers were assimilated into the ARM horde. Once there, they could enjoy the benefits of ARM’s developer ecosystem and, more importantly, start the long climb up the ARM product ladder. Young M0 programmers today might become fat and lucrative A-series programmers in the future. It was a cunning plan.

If the M0 was the bait on the hook, it worked well. But no sooner had ARM cast its line into the teeming waters of the embedded marketplace than the company started to rethink its strategy. “Perhaps the M0 isn’t simple enough,” they wondered. “Maybe we could make it even more tempting and snag a few more fish.”

Thus was the M0+ begotten. This is probably what the original M0 should have been all along, but it’s too late to rename that one M1, or M0.5. Despite the incremental suffix, the M0+ sits lower down the product tree than the M0 does, in the sense that it uses even less power. On the other hand, the M0+ also delivers slightly better performance than the M0, so maybe it’s the one that should be renamed M0.5. Is your head hurting yet?

Both the M0 and the M0+ run the same instruction set, so they’re binary compatible with one another as well as with the rest of the M-series product line. They’re technically 32-bit processors, in the sense that they have 32-bit internal registers and 32-bit precision on arithmetic operations, yet both execute 16-bit opcodes designed to conserve memory space. The M0+ has no cache to speak of, so it fetches and executes directly from whatever RAM or ROM you give it, á la basic MCU chips of yesteryear. There’s also no option for an FPU, so floating-point math is out of the question unless you like doing it all in software and waiting a week.

What really separates the M0+ from the M0 is its two-stage pipeline. What’s that, you say? Don’t all processors require a 3-stage pipe at minimum? Well, yes and no. CPU Architecture 101 teaches us that all processors must fetch, decode, and execute instructions, hence the traditionally minimalist 3-stage pipeline. Faster processors (i.e., almost anything) often have many more stages than that for the purpose of finely subdividing each of those tasks into smaller, but faster, circuits. But if you’re willing to go the other way, you can also collapse the holy trinity into fewer than three stages, which is what the M0+ does. The first stage fetches instructions and begins to decode them, while the second/last stage finishes decoding and executes the appropriate operation.

Bizarrely, there’s no clock-frequency penalty for this dumbed-down pipeline. In fact, it gets faster. That’s right, kids: the M0+ actually runs faster than the M0 it supposedly doesn’t supersede. The exact clock speed depends on your silicon-manufacturing technology, chip layout, phase of the moon, and a dozen other factors, but all things being equal, the M0+ will run about 10–15% faster clock speeds than an M0. Go figure.

That’s not all. The M0+ is also more efficient per clock than the M0. It delivers 1.77 CoreMarks per MHz, versus 1.62 for the M0, an improvement of almost 10%. And it gets weirder. The M0+ is also more power-efficient, by about 30%, which is a big deal when you’re counting microamps and joules. 

So let’s see… shorter pipeline, faster clock rate, lower power consumption, same approximate gate count, and same silicon area. Seems to me like the M0+ is what the M0 should have been all along. It also suggests that the latter design isn’t long for this world. I mean, would you buy one?

You might, actually, because ARM is keeping the original M0 attractive through carefully managed pricing. The company has priced the M0 below the M0+ precisely because there’s no other way to keep both products alive at the same time. ARM never publishes its price list, but figure on saving tens of thousands of dollars on your licensing agreement for the former over the latter. That savings could pay for a few hours of therapy as you try to wrap your head around ARM’s new divide-by-zero error. 

3 thoughts on “Divide (and Conquer) by Zero”

  1. minor correction: The M1 is a special FPGA variant of the M0, binary compatible but completely differently implemented (or so they say).
    That’s where they wasted the numbers. They should have called the M1 “M1-for-fpga” or whatever and called the M0 M1, then they could have named the new M0+ simpler M0.

  2. I’m wondering where they go now that they’ve counted down to zero. Will they continue into negative territory with the “m minus one?” Will they get imaginary with the “M-i”? Will they go completely irrational with the “M-e”?

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

Product Update: Broad Portfolio of DesignWare IP for Mobile SoCs

Sponsored by Synopsys

Get the latest update on DesignWare IP® for mobile SoCs, including MIPI C-PHY/D-PHY, USB 3.1, and UFS, which provide the necessary throughput, bandwidth, and efficiency for today’s advanced mobile SoCs.

Click here for more information about DesignWare IP for 5G Mobile

Featured paper

Top 9 design questions about digital isolators

Sponsored by Texas Instruments

Looking for more information about digital isolators? We’re here to help. Based on TI E2E™ support forum feedback, we compiled a list of the most frequently asked questions about digital isolator design challenges. This article covers questions such as, “What is the logic state of a digital isolator with no input signal?”, and “Can you leave unused channel pins on a digital isolator floating?”

Click here to download the whitepaper

Featured Chalk Talk

Embedded Display Applications Innovation

Sponsored by Mouser Electronics and Texas Instruments

DLP technology can add a whole new dimension to your embedded design. If you considered DLP in the past, but were put off by the cost, you need to watch this episode of Chalk Talk where Amelia Dalton chats with Philippe Dollo of Texas Instruments about the DLP LightCrafter 2000 EVM. This new kit makes DLP more accessible and less expensive to design in, and could have a dramatic impact on your next embedded design.

Click here for more information about Texas Instruments DLP2000 Digital Micromirror Device (DMD)