feature article
Subscribe Now

Divide (and Conquer) by Zero

ARM’s Cortex-M0+ Proves Less Is More, More or Less

Just when you think ARM can’t sink any lower, they do this.

I mean that in a good way. ARM – effectively the world’s largest microprocessor company – has come out with yet another variation on its ubiquitous CPU architectural theme. This time it’s called the Cortex-M0+ because, well, M0 was already taken.

Yes, the company has officially run out of numbers.

The em-zero-plus is a lot like the em-zero, of course, but different. It’s got a plus sign in its name, for one. And it’s got some features that the M0 doesn’t have. Your first impulse might be to conclude that the M0+ is a replacement for the M0 – that was certainly my first impression – but it’s not. The official ARM party line is that the M0 and the M0+ will happily coexist side by side, and that one does not necessarily replace the other.

I’m not convinced. The more I look at the M0+, the more I think it’s really just the M0 done right. After all, it’s faster than the M0 but uses less energy and fits in the same silicon area. So on quick-and-dirty technical criteria the M0+ is better all around. The only disadvantage I can see is that the M0+ is more expensive to license than the M0, but that’s purely because ARM’s marketing department decreed it so. In short, the only advantage the M0 has is an artificial one.

Here’s the background: ARM has approximately a zillion different related CPU designs, all broken up into three big families: the A-series, the R-series, and the M-series. (Get it? A.R.M.! See what ARM’s marketing people did there?) The A-series has all the fast, high-end processors while the M-series is all the small, cheap, low-end stuff. Bryon Moyer described all this in his August 22, 2011 article.

Anyway, the lowest of the low-end M-series chips used to be the M3, but that was superseded (undercut?) by the even lower-end M1. (There is no M2; at least, not yet.) The M1 was ultimately undermined by the M0, which obviously marked the absolute bottom of the range. The M0 was the simplest, cheapest, smallest, and most power-efficient thing you could make and still call it an ARM processor. Anything simpler than that and you’re looking at an 8-bit MCU or a twisted rubber band. Little 32-bit processors just don’t come any simpler. 

Or so we thought.

Strategically, the entry-level M0 was designed to lure away 8-bit and 16-bit users and tempt them into joining the ARM camp. “Join us…” they beckoned. One by one, weary MCU programmers were assimilated into the ARM horde. Once there, they could enjoy the benefits of ARM’s developer ecosystem and, more importantly, start the long climb up the ARM product ladder. Young M0 programmers today might become fat and lucrative A-series programmers in the future. It was a cunning plan.

If the M0 was the bait on the hook, it worked well. But no sooner had ARM cast its line into the teeming waters of the embedded marketplace than the company started to rethink its strategy. “Perhaps the M0 isn’t simple enough,” they wondered. “Maybe we could make it even more tempting and snag a few more fish.”

Thus was the M0+ begotten. This is probably what the original M0 should have been all along, but it’s too late to rename that one M1, or M0.5. Despite the incremental suffix, the M0+ sits lower down the product tree than the M0 does, in the sense that it uses even less power. On the other hand, the M0+ also delivers slightly better performance than the M0, so maybe it’s the one that should be renamed M0.5. Is your head hurting yet?

Both the M0 and the M0+ run the same instruction set, so they’re binary compatible with one another as well as with the rest of the M-series product line. They’re technically 32-bit processors, in the sense that they have 32-bit internal registers and 32-bit precision on arithmetic operations, yet both execute 16-bit opcodes designed to conserve memory space. The M0+ has no cache to speak of, so it fetches and executes directly from whatever RAM or ROM you give it, á la basic MCU chips of yesteryear. There’s also no option for an FPU, so floating-point math is out of the question unless you like doing it all in software and waiting a week.

What really separates the M0+ from the M0 is its two-stage pipeline. What’s that, you say? Don’t all processors require a 3-stage pipe at minimum? Well, yes and no. CPU Architecture 101 teaches us that all processors must fetch, decode, and execute instructions, hence the traditionally minimalist 3-stage pipeline. Faster processors (i.e., almost anything) often have many more stages than that for the purpose of finely subdividing each of those tasks into smaller, but faster, circuits. But if you’re willing to go the other way, you can also collapse the holy trinity into fewer than three stages, which is what the M0+ does. The first stage fetches instructions and begins to decode them, while the second/last stage finishes decoding and executes the appropriate operation.

Bizarrely, there’s no clock-frequency penalty for this dumbed-down pipeline. In fact, it gets faster. That’s right, kids: the M0+ actually runs faster than the M0 it supposedly doesn’t supersede. The exact clock speed depends on your silicon-manufacturing technology, chip layout, phase of the moon, and a dozen other factors, but all things being equal, the M0+ will run about 10–15% faster clock speeds than an M0. Go figure.

That’s not all. The M0+ is also more efficient per clock than the M0. It delivers 1.77 CoreMarks per MHz, versus 1.62 for the M0, an improvement of almost 10%. And it gets weirder. The M0+ is also more power-efficient, by about 30%, which is a big deal when you’re counting microamps and joules. 

So let’s see… shorter pipeline, faster clock rate, lower power consumption, same approximate gate count, and same silicon area. Seems to me like the M0+ is what the M0 should have been all along. It also suggests that the latter design isn’t long for this world. I mean, would you buy one?

You might, actually, because ARM is keeping the original M0 attractive through carefully managed pricing. The company has priced the M0 below the M0+ precisely because there’s no other way to keep both products alive at the same time. ARM never publishes its price list, but figure on saving tens of thousands of dollars on your licensing agreement for the former over the latter. That savings could pay for a few hours of therapy as you try to wrap your head around ARM’s new divide-by-zero error. 

3 thoughts on “Divide (and Conquer) by Zero”

  1. minor correction: The M1 is a special FPGA variant of the M0, binary compatible but completely differently implemented (or so they say).
    That’s where they wasted the numbers. They should have called the M1 “M1-for-fpga” or whatever and called the M0 M1, then they could have named the new M0+ simpler M0.

  2. I’m wondering where they go now that they’ve counted down to zero. Will they continue into negative territory with the “m minus one?” Will they get imaginary with the “M-i”? Will they go completely irrational with the “M-e”?

Leave a Reply

featured blogs
Aug 15, 2018
https://youtu.be/6a0znbVfFJk \ Coming from the Cadence parking lot (camera Sean) Monday: Jobs: Farmer, Baker Tuesday: Jobs: Printer, Chocolate Maker Wednesday: Jobs: Programmer, Caver Thursday: Jobs: Some Lessons Learned Friday: Jobs: Five Lessons www.breakfastbytes.com Sign ...
Aug 15, 2018
VITA 57.4 FMC+ Standard As an ANSI/VITA member, Samtec supports the release of the new ANSI/VITA 57.4-2018 FPGA Mezzanine Card Plus Standard. VITA 57.4, also referred to as FMC+, expands upon the I/O capabilities defined in ANSI/VITA 57.1 FMC by adding two new connectors that...
Aug 15, 2018
The world recognizes the American healthcare system for its innovation in precision medicine, surgical techniques, medical devices, and drug development. But they'€™ve been slow to adopt 21st century t...
Aug 14, 2018
I worked at HP in Ft. Collins, Colorado back in the 1970s. It was a heady experience. We were designing and building early, pre-PC desktop computers and we owned the market back then. The division I worked for eventually migrated to 32-bit workstations, chased from the deskto...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...