feature article
Subscribe Now

Divide (and Conquer) by Zero

ARM’s Cortex-M0+ Proves Less Is More, More or Less

Just when you think ARM can’t sink any lower, they do this.

I mean that in a good way. ARM – effectively the world’s largest microprocessor company – has come out with yet another variation on its ubiquitous CPU architectural theme. This time it’s called the Cortex-M0+ because, well, M0 was already taken.

Yes, the company has officially run out of numbers.

The em-zero-plus is a lot like the em-zero, of course, but different. It’s got a plus sign in its name, for one. And it’s got some features that the M0 doesn’t have. Your first impulse might be to conclude that the M0+ is a replacement for the M0 – that was certainly my first impression – but it’s not. The official ARM party line is that the M0 and the M0+ will happily coexist side by side, and that one does not necessarily replace the other.

I’m not convinced. The more I look at the M0+, the more I think it’s really just the M0 done right. After all, it’s faster than the M0 but uses less energy and fits in the same silicon area. So on quick-and-dirty technical criteria the M0+ is better all around. The only disadvantage I can see is that the M0+ is more expensive to license than the M0, but that’s purely because ARM’s marketing department decreed it so. In short, the only advantage the M0 has is an artificial one.

Here’s the background: ARM has approximately a zillion different related CPU designs, all broken up into three big families: the A-series, the R-series, and the M-series. (Get it? A.R.M.! See what ARM’s marketing people did there?) The A-series has all the fast, high-end processors while the M-series is all the small, cheap, low-end stuff. Bryon Moyer described all this in his August 22, 2011 article.

Anyway, the lowest of the low-end M-series chips used to be the M3, but that was superseded (undercut?) by the even lower-end M1. (There is no M2; at least, not yet.) The M1 was ultimately undermined by the M0, which obviously marked the absolute bottom of the range. The M0 was the simplest, cheapest, smallest, and most power-efficient thing you could make and still call it an ARM processor. Anything simpler than that and you’re looking at an 8-bit MCU or a twisted rubber band. Little 32-bit processors just don’t come any simpler. 

Or so we thought.

Strategically, the entry-level M0 was designed to lure away 8-bit and 16-bit users and tempt them into joining the ARM camp. “Join us…” they beckoned. One by one, weary MCU programmers were assimilated into the ARM horde. Once there, they could enjoy the benefits of ARM’s developer ecosystem and, more importantly, start the long climb up the ARM product ladder. Young M0 programmers today might become fat and lucrative A-series programmers in the future. It was a cunning plan.

If the M0 was the bait on the hook, it worked well. But no sooner had ARM cast its line into the teeming waters of the embedded marketplace than the company started to rethink its strategy. “Perhaps the M0 isn’t simple enough,” they wondered. “Maybe we could make it even more tempting and snag a few more fish.”

Thus was the M0+ begotten. This is probably what the original M0 should have been all along, but it’s too late to rename that one M1, or M0.5. Despite the incremental suffix, the M0+ sits lower down the product tree than the M0 does, in the sense that it uses even less power. On the other hand, the M0+ also delivers slightly better performance than the M0, so maybe it’s the one that should be renamed M0.5. Is your head hurting yet?

Both the M0 and the M0+ run the same instruction set, so they’re binary compatible with one another as well as with the rest of the M-series product line. They’re technically 32-bit processors, in the sense that they have 32-bit internal registers and 32-bit precision on arithmetic operations, yet both execute 16-bit opcodes designed to conserve memory space. The M0+ has no cache to speak of, so it fetches and executes directly from whatever RAM or ROM you give it, á la basic MCU chips of yesteryear. There’s also no option for an FPU, so floating-point math is out of the question unless you like doing it all in software and waiting a week.

What really separates the M0+ from the M0 is its two-stage pipeline. What’s that, you say? Don’t all processors require a 3-stage pipe at minimum? Well, yes and no. CPU Architecture 101 teaches us that all processors must fetch, decode, and execute instructions, hence the traditionally minimalist 3-stage pipeline. Faster processors (i.e., almost anything) often have many more stages than that for the purpose of finely subdividing each of those tasks into smaller, but faster, circuits. But if you’re willing to go the other way, you can also collapse the holy trinity into fewer than three stages, which is what the M0+ does. The first stage fetches instructions and begins to decode them, while the second/last stage finishes decoding and executes the appropriate operation.

Bizarrely, there’s no clock-frequency penalty for this dumbed-down pipeline. In fact, it gets faster. That’s right, kids: the M0+ actually runs faster than the M0 it supposedly doesn’t supersede. The exact clock speed depends on your silicon-manufacturing technology, chip layout, phase of the moon, and a dozen other factors, but all things being equal, the M0+ will run about 10–15% faster clock speeds than an M0. Go figure.

That’s not all. The M0+ is also more efficient per clock than the M0. It delivers 1.77 CoreMarks per MHz, versus 1.62 for the M0, an improvement of almost 10%. And it gets weirder. The M0+ is also more power-efficient, by about 30%, which is a big deal when you’re counting microamps and joules. 

So let’s see… shorter pipeline, faster clock rate, lower power consumption, same approximate gate count, and same silicon area. Seems to me like the M0+ is what the M0 should have been all along. It also suggests that the latter design isn’t long for this world. I mean, would you buy one?

You might, actually, because ARM is keeping the original M0 attractive through carefully managed pricing. The company has priced the M0 below the M0+ precisely because there’s no other way to keep both products alive at the same time. ARM never publishes its price list, but figure on saving tens of thousands of dollars on your licensing agreement for the former over the latter. That savings could pay for a few hours of therapy as you try to wrap your head around ARM’s new divide-by-zero error. 

3 thoughts on “Divide (and Conquer) by Zero”

  1. minor correction: The M1 is a special FPGA variant of the M0, binary compatible but completely differently implemented (or so they say).
    That’s where they wasted the numbers. They should have called the M1 “M1-for-fpga” or whatever and called the M0 M1, then they could have named the new M0+ simpler M0.

  2. I’m wondering where they go now that they’ve counted down to zero. Will they continue into negative territory with the “m minus one?” Will they get imaginary with the “M-i”? Will they go completely irrational with the “M-e”?

Leave a Reply

featured blogs
May 14, 2021
Another Friday, another week chock full of CFD, CAE, and CAD news. This week features a topic near and dear to my heart involving death of the rainbow color map for displaying simulation results.... [[ Click on the title to access the full blog on the Cadence Community site....
May 13, 2021
Samtec will attend the PCI-SIG Virtual Developers Conference on Tuesday, May 25th through Wednesday, May 26th, 2021. This is a free event for the 800+ member companies that develop and bring to market new products utilizing PCI Express technology. Attendee Registration is sti...
May 13, 2021
Our new IC design tool, PrimeSim Continuum, enables the next generation of hyper-convergent IC designs. Learn more from eeNews, Electronic Design & EE Times. The post Synopsys Makes Headlines with PrimeSim Continuum, an Innovative Circuit Simulation Solution appeared fi...
May 13, 2021
By Calibre Design Staff Prior to the availability of extreme ultraviolet (EUV) lithography, multi-patterning provided… The post A SAMPle of what you need to know about SAMP technology appeared first on Design with Calibre....

featured video

Insights on StarRC Standalone Netlist Reducer

Sponsored by Synopsys

With the ever-growing size of extracted netlists, parasitic optimization is key to achieve practical simulation run times. Key trade-off for any netlist reducer is accuracy vs netlist size. StarRC Standalone Netlist reducer provides the flexibility to optimize your netlist on a per net basis. The user has total control of trading accuracy of some nets versus netlist optimization - yet another feature from StarRC to provide flexibility to the designer.

Click here for more information

featured paper

Optimizing an OpenCL AI Kernel for the data center using Silexica’s SLX FPGA

Sponsored by Silexica

AI applications are increasingly contributing to FPGAs being used as co-processors in data centers. Silexica's newest application note shows how SLX FPGA accelerates an AI-related face detection design example, leveraging the bottom-up flow of Xilinx’s Vitis 2020.2 and Alveo U280 accelerator card.

Click to read

featured chalk talk

Using the Graphical PMSM FOC Component in Harmony3

Sponsored by Mouser Electronics and Microchip

Developing embedded software, and particularly configuring your embedded system can be a major pain for development engineers. Getting all the drivers, middleware, and libraries you need set up and in the right place and working is a constant source of frustration. In this episode of Chak Talk, Amelia Dalton chats with Brett Novak of Microchip about Microchip’s MPLAB Harmony 3, with the MPLAB Harmony Configurator - an embedded development framework with a drag-and-drop GUI that makes configuration a snap.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment (IDE)