feature article
Subscribe Now

Divide (and Conquer) by Zero

ARM’s Cortex-M0+ Proves Less Is More, More or Less

Just when you think ARM can’t sink any lower, they do this.

I mean that in a good way. ARM – effectively the world’s largest microprocessor company – has come out with yet another variation on its ubiquitous CPU architectural theme. This time it’s called the Cortex-M0+ because, well, M0 was already taken.

Yes, the company has officially run out of numbers.

The em-zero-plus is a lot like the em-zero, of course, but different. It’s got a plus sign in its name, for one. And it’s got some features that the M0 doesn’t have. Your first impulse might be to conclude that the M0+ is a replacement for the M0 – that was certainly my first impression – but it’s not. The official ARM party line is that the M0 and the M0+ will happily coexist side by side, and that one does not necessarily replace the other.

I’m not convinced. The more I look at the M0+, the more I think it’s really just the M0 done right. After all, it’s faster than the M0 but uses less energy and fits in the same silicon area. So on quick-and-dirty technical criteria the M0+ is better all around. The only disadvantage I can see is that the M0+ is more expensive to license than the M0, but that’s purely because ARM’s marketing department decreed it so. In short, the only advantage the M0 has is an artificial one.

Here’s the background: ARM has approximately a zillion different related CPU designs, all broken up into three big families: the A-series, the R-series, and the M-series. (Get it? A.R.M.! See what ARM’s marketing people did there?) The A-series has all the fast, high-end processors while the M-series is all the small, cheap, low-end stuff. Bryon Moyer described all this in his August 22, 2011 article.

Anyway, the lowest of the low-end M-series chips used to be the M3, but that was superseded (undercut?) by the even lower-end M1. (There is no M2; at least, not yet.) The M1 was ultimately undermined by the M0, which obviously marked the absolute bottom of the range. The M0 was the simplest, cheapest, smallest, and most power-efficient thing you could make and still call it an ARM processor. Anything simpler than that and you’re looking at an 8-bit MCU or a twisted rubber band. Little 32-bit processors just don’t come any simpler. 

Or so we thought.

Strategically, the entry-level M0 was designed to lure away 8-bit and 16-bit users and tempt them into joining the ARM camp. “Join us…” they beckoned. One by one, weary MCU programmers were assimilated into the ARM horde. Once there, they could enjoy the benefits of ARM’s developer ecosystem and, more importantly, start the long climb up the ARM product ladder. Young M0 programmers today might become fat and lucrative A-series programmers in the future. It was a cunning plan.

If the M0 was the bait on the hook, it worked well. But no sooner had ARM cast its line into the teeming waters of the embedded marketplace than the company started to rethink its strategy. “Perhaps the M0 isn’t simple enough,” they wondered. “Maybe we could make it even more tempting and snag a few more fish.”

Thus was the M0+ begotten. This is probably what the original M0 should have been all along, but it’s too late to rename that one M1, or M0.5. Despite the incremental suffix, the M0+ sits lower down the product tree than the M0 does, in the sense that it uses even less power. On the other hand, the M0+ also delivers slightly better performance than the M0, so maybe it’s the one that should be renamed M0.5. Is your head hurting yet?

Both the M0 and the M0+ run the same instruction set, so they’re binary compatible with one another as well as with the rest of the M-series product line. They’re technically 32-bit processors, in the sense that they have 32-bit internal registers and 32-bit precision on arithmetic operations, yet both execute 16-bit opcodes designed to conserve memory space. The M0+ has no cache to speak of, so it fetches and executes directly from whatever RAM or ROM you give it, á la basic MCU chips of yesteryear. There’s also no option for an FPU, so floating-point math is out of the question unless you like doing it all in software and waiting a week.

What really separates the M0+ from the M0 is its two-stage pipeline. What’s that, you say? Don’t all processors require a 3-stage pipe at minimum? Well, yes and no. CPU Architecture 101 teaches us that all processors must fetch, decode, and execute instructions, hence the traditionally minimalist 3-stage pipeline. Faster processors (i.e., almost anything) often have many more stages than that for the purpose of finely subdividing each of those tasks into smaller, but faster, circuits. But if you’re willing to go the other way, you can also collapse the holy trinity into fewer than three stages, which is what the M0+ does. The first stage fetches instructions and begins to decode them, while the second/last stage finishes decoding and executes the appropriate operation.

Bizarrely, there’s no clock-frequency penalty for this dumbed-down pipeline. In fact, it gets faster. That’s right, kids: the M0+ actually runs faster than the M0 it supposedly doesn’t supersede. The exact clock speed depends on your silicon-manufacturing technology, chip layout, phase of the moon, and a dozen other factors, but all things being equal, the M0+ will run about 10–15% faster clock speeds than an M0. Go figure.

That’s not all. The M0+ is also more efficient per clock than the M0. It delivers 1.77 CoreMarks per MHz, versus 1.62 for the M0, an improvement of almost 10%. And it gets weirder. The M0+ is also more power-efficient, by about 30%, which is a big deal when you’re counting microamps and joules. 

So let’s see… shorter pipeline, faster clock rate, lower power consumption, same approximate gate count, and same silicon area. Seems to me like the M0+ is what the M0 should have been all along. It also suggests that the latter design isn’t long for this world. I mean, would you buy one?

You might, actually, because ARM is keeping the original M0 attractive through carefully managed pricing. The company has priced the M0 below the M0+ precisely because there’s no other way to keep both products alive at the same time. ARM never publishes its price list, but figure on saving tens of thousands of dollars on your licensing agreement for the former over the latter. That savings could pay for a few hours of therapy as you try to wrap your head around ARM’s new divide-by-zero error. 

3 thoughts on “Divide (and Conquer) by Zero”

  1. minor correction: The M1 is a special FPGA variant of the M0, binary compatible but completely differently implemented (or so they say).
    That’s where they wasted the numbers. They should have called the M1 “M1-for-fpga” or whatever and called the M0 M1, then they could have named the new M0+ simpler M0.

  2. I’m wondering where they go now that they’ve counted down to zero. Will they continue into negative territory with the “m minus one?” Will they get imaginary with the “M-i”? Will they go completely irrational with the “M-e”?

Leave a Reply

featured blogs
Jun 1, 2023
Cadence was a proud sponsor of the SEMINATEC 2023 conference, held at the University of Campinas in Brazil from March 29-31, 2023. This conference brings together industry representatives, academia, research and development centers, government organizations, and students to d...
Jun 1, 2023
In honor of Pride Month, members of our Synopsys PRIDE employee resource group (ERG) share thoughtful lessons on becoming an LGBTQIA+ ally and more. The post Pride Month 2023: Thoughtful Lessons from the Synopsys PRIDE ERG appeared first on New Horizons for Chip Design....
May 8, 2023
If you are planning on traveling to Turkey in the not-so-distant future, then I have a favor to ask....

featured video

The Role of Artificial Intelligence and Machine Learning in Electronic Design

Sponsored by Cadence Design Systems

In this video, we talk to Paul Cunningham, Senior VP and GM at Cadence, about the transformative role of artificial intelligence and machine learning (AI/ML) in electronic designs. We discuss the transformative period we are experiencing with AI and ML and how Cadence is revolutionizing how we design and verify chips through “computationalizing intuition” and building intuitive systems that learn and adapt to the world around them. With human lives at stake, reliability, and safety are paramount.

Learn More

featured contest

Join the AI Generated Open-Source Silicon Design Challenge

Sponsored by Efabless

Get your AI-generated design manufactured ($9,750 value)! Enter the E-fabless open-source silicon design challenge. Use generative AI to create Verilog from natural language prompts, then implement your design using the Efabless chipIgnite platform - including an SoC template (Caravel) providing rapid chip-level integration, and an open-source RTL-to-GDS digital design flow (OpenLane). The winner gets their design manufactured by eFabless. Hurry, though - deadline is June 2!

Click here to enter!

featured chalk talk

EV Charging: Understanding the Basics
Sponsored by Mouser Electronics and Bel
Have you ever considered what the widespread adoption of electric vehicles will look like? What infrastructure requirements will need to be met? In this episode of Chalk Talk, I chat about all of this and more with Bruce Rose from Bel. We review the basics of EV charging, investigate the charging requirements for both AC and DC chargers, and examine the role that on-board inverters play in electric vehicle charging.
Mar 27, 2023
9,068 views