feature article
Subscribe Now

Divide (and Conquer) by Zero

ARM’s Cortex-M0+ Proves Less Is More, More or Less

Just when you think ARM can’t sink any lower, they do this.

I mean that in a good way. ARM – effectively the world’s largest microprocessor company – has come out with yet another variation on its ubiquitous CPU architectural theme. This time it’s called the Cortex-M0+ because, well, M0 was already taken.

Yes, the company has officially run out of numbers.

The em-zero-plus is a lot like the em-zero, of course, but different. It’s got a plus sign in its name, for one. And it’s got some features that the M0 doesn’t have. Your first impulse might be to conclude that the M0+ is a replacement for the M0 – that was certainly my first impression – but it’s not. The official ARM party line is that the M0 and the M0+ will happily coexist side by side, and that one does not necessarily replace the other.

I’m not convinced. The more I look at the M0+, the more I think it’s really just the M0 done right. After all, it’s faster than the M0 but uses less energy and fits in the same silicon area. So on quick-and-dirty technical criteria the M0+ is better all around. The only disadvantage I can see is that the M0+ is more expensive to license than the M0, but that’s purely because ARM’s marketing department decreed it so. In short, the only advantage the M0 has is an artificial one.

Here’s the background: ARM has approximately a zillion different related CPU designs, all broken up into three big families: the A-series, the R-series, and the M-series. (Get it? A.R.M.! See what ARM’s marketing people did there?) The A-series has all the fast, high-end processors while the M-series is all the small, cheap, low-end stuff. Bryon Moyer described all this in his August 22, 2011 article.

Anyway, the lowest of the low-end M-series chips used to be the M3, but that was superseded (undercut?) by the even lower-end M1. (There is no M2; at least, not yet.) The M1 was ultimately undermined by the M0, which obviously marked the absolute bottom of the range. The M0 was the simplest, cheapest, smallest, and most power-efficient thing you could make and still call it an ARM processor. Anything simpler than that and you’re looking at an 8-bit MCU or a twisted rubber band. Little 32-bit processors just don’t come any simpler. 

Or so we thought.

Strategically, the entry-level M0 was designed to lure away 8-bit and 16-bit users and tempt them into joining the ARM camp. “Join us…” they beckoned. One by one, weary MCU programmers were assimilated into the ARM horde. Once there, they could enjoy the benefits of ARM’s developer ecosystem and, more importantly, start the long climb up the ARM product ladder. Young M0 programmers today might become fat and lucrative A-series programmers in the future. It was a cunning plan.

If the M0 was the bait on the hook, it worked well. But no sooner had ARM cast its line into the teeming waters of the embedded marketplace than the company started to rethink its strategy. “Perhaps the M0 isn’t simple enough,” they wondered. “Maybe we could make it even more tempting and snag a few more fish.”

Thus was the M0+ begotten. This is probably what the original M0 should have been all along, but it’s too late to rename that one M1, or M0.5. Despite the incremental suffix, the M0+ sits lower down the product tree than the M0 does, in the sense that it uses even less power. On the other hand, the M0+ also delivers slightly better performance than the M0, so maybe it’s the one that should be renamed M0.5. Is your head hurting yet?

Both the M0 and the M0+ run the same instruction set, so they’re binary compatible with one another as well as with the rest of the M-series product line. They’re technically 32-bit processors, in the sense that they have 32-bit internal registers and 32-bit precision on arithmetic operations, yet both execute 16-bit opcodes designed to conserve memory space. The M0+ has no cache to speak of, so it fetches and executes directly from whatever RAM or ROM you give it, á la basic MCU chips of yesteryear. There’s also no option for an FPU, so floating-point math is out of the question unless you like doing it all in software and waiting a week.

What really separates the M0+ from the M0 is its two-stage pipeline. What’s that, you say? Don’t all processors require a 3-stage pipe at minimum? Well, yes and no. CPU Architecture 101 teaches us that all processors must fetch, decode, and execute instructions, hence the traditionally minimalist 3-stage pipeline. Faster processors (i.e., almost anything) often have many more stages than that for the purpose of finely subdividing each of those tasks into smaller, but faster, circuits. But if you’re willing to go the other way, you can also collapse the holy trinity into fewer than three stages, which is what the M0+ does. The first stage fetches instructions and begins to decode them, while the second/last stage finishes decoding and executes the appropriate operation.

Bizarrely, there’s no clock-frequency penalty for this dumbed-down pipeline. In fact, it gets faster. That’s right, kids: the M0+ actually runs faster than the M0 it supposedly doesn’t supersede. The exact clock speed depends on your silicon-manufacturing technology, chip layout, phase of the moon, and a dozen other factors, but all things being equal, the M0+ will run about 10–15% faster clock speeds than an M0. Go figure.

That’s not all. The M0+ is also more efficient per clock than the M0. It delivers 1.77 CoreMarks per MHz, versus 1.62 for the M0, an improvement of almost 10%. And it gets weirder. The M0+ is also more power-efficient, by about 30%, which is a big deal when you’re counting microamps and joules. 

So let’s see… shorter pipeline, faster clock rate, lower power consumption, same approximate gate count, and same silicon area. Seems to me like the M0+ is what the M0 should have been all along. It also suggests that the latter design isn’t long for this world. I mean, would you buy one?

You might, actually, because ARM is keeping the original M0 attractive through carefully managed pricing. The company has priced the M0 below the M0+ precisely because there’s no other way to keep both products alive at the same time. ARM never publishes its price list, but figure on saving tens of thousands of dollars on your licensing agreement for the former over the latter. That savings could pay for a few hours of therapy as you try to wrap your head around ARM’s new divide-by-zero error. 

3 thoughts on “Divide (and Conquer) by Zero”

  1. minor correction: The M1 is a special FPGA variant of the M0, binary compatible but completely differently implemented (or so they say).
    That’s where they wasted the numbers. They should have called the M1 “M1-for-fpga” or whatever and called the M0 M1, then they could have named the new M0+ simpler M0.

  2. I’m wondering where they go now that they’ve counted down to zero. Will they continue into negative territory with the “m minus one?” Will they get imaginary with the “M-i”? Will they go completely irrational with the “M-e”?

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 24, 2024
Learn about maskless electron beam lithography and see how Multibeam's industry-first e-beam semiconductor lithography system leverages Synopsys software.The post Synopsys and Multibeam Accelerate Innovation with First Production-Ready E-Beam Lithography System appeared fir...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Portable Medical Devices and Connected Health
Decentralized healthcare is moving from hospitals and doctors’ offices to the patients’ home and office and in the form of personal, wearable, and connected devices. In this episode of Chalk Talk, Amelia Dalton and Roger Bohannan from Littelfuse examine the components, functions and standards for a variety of portable connected medical devices. They investigate how Littelfuse can help you navigate the development of your next portable connected medical design.
Jun 26, 2023
34,356 views