feature article
Subscribe Now

ARM Dips Toe Into Configurability Pool

Cortex-M33 Adds Support for User-Created Instructions… Sort Of.

“My ghast was flabbered.” – Anthony Grayling

User-configurable microprocessors are a subject near and dear to me, so I was excited to hear that ARM, a company renowned for its iron-fisted control over its CPU architecture, was loosening its grip and allowing users to create their own custom instructions. Could it be? Had the company really joined the ranks of the user-configurable army pioneered by Tensilica, ARC, RISC-V, and others? 

Well… no, not really. 

While ARM will, for the first time, allow its licensees to create their own instructions to extend the processor’s functionality, it’s a far cry from the fully user-configurable CPUs we’ve seen from other companies. In fact, it’s about the least you could do and still be able to check off the “user-configurable” box on a developer’s wish list. 

Still, it’s a step in the right direction and an indication that, “ARM is changing,” in the words of Thomas Ensergueix, the company’s Senior Director for Embedded, Automotive, and IoT Business. The company wants to see its nearly ubiquitous processors make inroads into edge devices that often have unusual, domain-specific tasks to accomplish. Sensor aggregation and motor control, for instance, aren’t well suited to generic CPU architectures. Customers often wind up using a DSP or an FPGA, or they design custom hardware – including user-defined CPUs. ARM wants to elbow into that business. 

ARM has supported coprocessors for a long time, but that’s not the same thing as new instructions. A coprocessor like an FPU or a DSP (also from ARM) operates alongside and somewhat independent of the main Cortex CPU. Coprocessors have the advantage of being able to run in parallel with the main CPU so that long-latency instructions (a floating-point divide, for example) don’t hang up the main processing pipeline. But they also have the disadvantage of operating at arm’s length, so to speak. Operands must be explicitly moved into and out of the coprocessor over the CPU bus, which takes time and burns energy. They’re like big peripherals that consume data and instructions. 

In contrast to all that, new user-created instructions will be part of the processor core and execute as part of its CPU instruction stream. New instructions have direct access to the CPU’s register file and don’t need to be spoon-fed operands like coprocessors do. This should permit faster and more efficient instructions that also use less energy. New instructions can be arbitrarily complex. Sounds like a win. 

But there are several limitations. 

For one, this newfound freedom applies only to the Cortex-M33. No other ARM processors support this new feature, nor will it be retroactively added to any of them. Even Cortex-M33 doesn’t really support it yet; that upgrade comes next year. Looking ahead, future Cortex-M designs will support user-defined instructions, but only Cortex-M designs. Although future Cortex-A or R-series processors might get this ability, I wouldn’t bet on it. 

Second, your user-defined extensions can’t access memory. Or control registers. They can access the processor’s main data registers (r0 through r12), which means you can do arithmetic or logic operations on register data, but not much else. You could implement your own bit-twiddle instructions, but only if they don’t need to touch memory. You can’t do flow control (no if/then/else), nor can you tweak the processor’s control registers or secure areas. Massage register data all you want, but ARM draws the line there. 

On the plus side, whatever you do create is your property; ARM has no rights to the IP, and you could theoretically sell your handiwork to others if you want to. ARM hopes, somewhat optimistically, that third-party design houses might do a nice side business hot rodding Cortex-M processors with various custom ISA upgrades. That’s a fun idea, although history suggests it won’t happen. 

You’re also on your own for software development. None of ARM’s programming tools support user-defined instructions – obviously – nor do any third-party tools. Any instructions you create will have to be hand-coded using assembler or intrinsics. Same goes for debugging: nobody’s debugger will understand what your new FUBAR operation does or how it should behave, and most will treat it as either an illegal operation or as a coprocessor operation. (User-defined instructions overlay coprocessor functions in the opcode map.) Planned updates will allow the tools to tolerate user-created instructions, but they still won’t understand them. 

How many Cortex-M33 licensees will start creating their own instructions? Probably not very many. That’s not a slam against ARM’s implementation; it’s just the the reality of user-configurable processors in general. Everybody likes the idea of souping up their processor and adding their own secret sauce, but few developers actually do it. Paradoxically, even the processors that are known for being user-configurable, such as ARC (from Synopsys) and Tensilica (from Cadence), are overwhelmingly used in their default “factory configuration.” Like an SUV that’s never taken off-road, it sounds like a great idea right up until it’s time to leave the beaten path. 

Which is a shame, because customizable processors have real, tangible advantages. It’s not all marketing glitter. A motor-control loop can benefit hugely from just one or two custom instructions (sine or cosine, for example) that aren’t part of the standard Cortex-M instruction set. Cryptography applications, sensor data acquisition, network filtering, and dozens of other obscure corners of the embedded world can all benefit from processors that are tweaked to handle their unique data types or oddball arithmetic operations. Tenfold performance improvements are not unheard of. Just eliminating the back-and-forth latency of the coprocessor interface can make a difference. 

And yet… ARM’s move feels like it’s only a very small step in that direction. On one hand, it’s a big philosophical change for a company that has always avoided branches in the family tree, mostly to ensure that (almost) all ARM processors are (mostly) binary compatible with one another. It’s a guiding principle that’s served the company well. Diverting from that path is kind of a big deal. 

There’s also the substantial engineering work that went into this thing. Designing a CPU that can tolerate tacked-on third-party hardware is no easy feat. ARM’s cores are pretty tightly optimized, but there’s no telling how sloppy a customer’s additions might be. Plus, ARM will take on a whole new support burden fielding calls from DIY processor architects tampering with its products. 

But this also feels like a bit of PR handwaving, a move intended to blunt the appeal of RISC-V, Tensilica, ARC, and other modular or customizable processors. Cortex-M33 is nowhere near as adjustable as those others, despite coming to the market 20 years later. But now ARM can credibly join the conversation. For designers who think they want user-configurability – whether they really use it or not – Cortex-M33 can now make the short list. 

One thought on “ARM Dips Toe Into Configurability Pool”

  1. It’s like the guy who invented universal solvent — could not package it to ship or sell.
    What is really needed is certainly not at the level of diddling registers.
    The overhead of moving data between memory and registers is a big part of the problem.
    Since the data comes from somewhere outside memory, the sensible thing is to at least do some processing as it moves through the input path.
    Putting raw data into memory just so it can be read from memory for processing is dumb.

Leave a Reply

featured blogs
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 20, 2022
This week was the 11th Embedded Vision Summit. So that means the first one, back in 2011, was just a couple of years after what I regard as the watershed event in vision, the poster session (it... ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Synopsys PPA(V) Voltage Optimization

Sponsored by Synopsys

Performance-per-watt has emerged as one of the highest priorities in design quality, leading to a shift in technology focus and design power optimization methodologies. Variable operating voltage possess high potential in optimizing performance-per-watt results but requires a signoff accurate and efficient methodology to explore. Synopsys Fusion Design Platform™, uniquely built on a singular RTL-to-GDSII data model, delivers a full-flow voltage optimization and closure methodology to achieve the best performance-per-watt results for the most demanding semiconductor segments.

Learn More

featured paper

Introducing new dynamic features for exterior automotive lights with DLP® technology

Sponsored by Texas Instruments

Exterior lighting, primarily used to illuminate ground areas near the vehicle door, can now be transformed into a projection system used for both vehicle communication and unique styling features. A small lighting module that utilizes automotive-grade digital micromirror devices, such as the DLP2021-Q1 or DLP3021-Q1, can display an endless number of patterns in any color imaginable as well as communicate warnings and alerts to drivers and other vehicles.

Click to read more

featured chalk talk

IsoMOV

Sponsored by Mouser Electronics and Bourns

Today, your circuit protection device needs to be versatile, handling a wide range of conditions with long-life low capacitance, low leakage, and state-of-the-art energy handling density. In this episode of Chalk Talk, Amelia Dalton chats with Paul Smith from Bourns about IsoMOV - a new integrated circuit protection that brings together the most important circuit protection capabilities in one efficient package.

Click here for more information about Bourns IsoMOV™ Series Hybrid Protection Component