feature article
Subscribe Now

ARM Dips Toe Into Configurability Pool

Cortex-M33 Adds Support for User-Created Instructions… Sort Of.

“My ghast was flabbered.” – Anthony Grayling

User-configurable microprocessors are a subject near and dear to me, so I was excited to hear that ARM, a company renowned for its iron-fisted control over its CPU architecture, was loosening its grip and allowing users to create their own custom instructions. Could it be? Had the company really joined the ranks of the user-configurable army pioneered by Tensilica, ARC, RISC-V, and others? 

Well… no, not really. 

While ARM will, for the first time, allow its licensees to create their own instructions to extend the processor’s functionality, it’s a far cry from the fully user-configurable CPUs we’ve seen from other companies. In fact, it’s about the least you could do and still be able to check off the “user-configurable” box on a developer’s wish list. 

Still, it’s a step in the right direction and an indication that, “ARM is changing,” in the words of Thomas Ensergueix, the company’s Senior Director for Embedded, Automotive, and IoT Business. The company wants to see its nearly ubiquitous processors make inroads into edge devices that often have unusual, domain-specific tasks to accomplish. Sensor aggregation and motor control, for instance, aren’t well suited to generic CPU architectures. Customers often wind up using a DSP or an FPGA, or they design custom hardware – including user-defined CPUs. ARM wants to elbow into that business. 

ARM has supported coprocessors for a long time, but that’s not the same thing as new instructions. A coprocessor like an FPU or a DSP (also from ARM) operates alongside and somewhat independent of the main Cortex CPU. Coprocessors have the advantage of being able to run in parallel with the main CPU so that long-latency instructions (a floating-point divide, for example) don’t hang up the main processing pipeline. But they also have the disadvantage of operating at arm’s length, so to speak. Operands must be explicitly moved into and out of the coprocessor over the CPU bus, which takes time and burns energy. They’re like big peripherals that consume data and instructions. 

In contrast to all that, new user-created instructions will be part of the processor core and execute as part of its CPU instruction stream. New instructions have direct access to the CPU’s register file and don’t need to be spoon-fed operands like coprocessors do. This should permit faster and more efficient instructions that also use less energy. New instructions can be arbitrarily complex. Sounds like a win. 

But there are several limitations. 

For one, this newfound freedom applies only to the Cortex-M33. No other ARM processors support this new feature, nor will it be retroactively added to any of them. Even Cortex-M33 doesn’t really support it yet; that upgrade comes next year. Looking ahead, future Cortex-M designs will support user-defined instructions, but only Cortex-M designs. Although future Cortex-A or R-series processors might get this ability, I wouldn’t bet on it. 

Second, your user-defined extensions can’t access memory. Or control registers. They can access the processor’s main data registers (r0 through r12), which means you can do arithmetic or logic operations on register data, but not much else. You could implement your own bit-twiddle instructions, but only if they don’t need to touch memory. You can’t do flow control (no if/then/else), nor can you tweak the processor’s control registers or secure areas. Massage register data all you want, but ARM draws the line there. 

On the plus side, whatever you do create is your property; ARM has no rights to the IP, and you could theoretically sell your handiwork to others if you want to. ARM hopes, somewhat optimistically, that third-party design houses might do a nice side business hot rodding Cortex-M processors with various custom ISA upgrades. That’s a fun idea, although history suggests it won’t happen. 

You’re also on your own for software development. None of ARM’s programming tools support user-defined instructions – obviously – nor do any third-party tools. Any instructions you create will have to be hand-coded using assembler or intrinsics. Same goes for debugging: nobody’s debugger will understand what your new FUBAR operation does or how it should behave, and most will treat it as either an illegal operation or as a coprocessor operation. (User-defined instructions overlay coprocessor functions in the opcode map.) Planned updates will allow the tools to tolerate user-created instructions, but they still won’t understand them. 

How many Cortex-M33 licensees will start creating their own instructions? Probably not very many. That’s not a slam against ARM’s implementation; it’s just the the reality of user-configurable processors in general. Everybody likes the idea of souping up their processor and adding their own secret sauce, but few developers actually do it. Paradoxically, even the processors that are known for being user-configurable, such as ARC (from Synopsys) and Tensilica (from Cadence), are overwhelmingly used in their default “factory configuration.” Like an SUV that’s never taken off-road, it sounds like a great idea right up until it’s time to leave the beaten path. 

Which is a shame, because customizable processors have real, tangible advantages. It’s not all marketing glitter. A motor-control loop can benefit hugely from just one or two custom instructions (sine or cosine, for example) that aren’t part of the standard Cortex-M instruction set. Cryptography applications, sensor data acquisition, network filtering, and dozens of other obscure corners of the embedded world can all benefit from processors that are tweaked to handle their unique data types or oddball arithmetic operations. Tenfold performance improvements are not unheard of. Just eliminating the back-and-forth latency of the coprocessor interface can make a difference. 

And yet… ARM’s move feels like it’s only a very small step in that direction. On one hand, it’s a big philosophical change for a company that has always avoided branches in the family tree, mostly to ensure that (almost) all ARM processors are (mostly) binary compatible with one another. It’s a guiding principle that’s served the company well. Diverting from that path is kind of a big deal. 

There’s also the substantial engineering work that went into this thing. Designing a CPU that can tolerate tacked-on third-party hardware is no easy feat. ARM’s cores are pretty tightly optimized, but there’s no telling how sloppy a customer’s additions might be. Plus, ARM will take on a whole new support burden fielding calls from DIY processor architects tampering with its products. 

But this also feels like a bit of PR handwaving, a move intended to blunt the appeal of RISC-V, Tensilica, ARC, and other modular or customizable processors. Cortex-M33 is nowhere near as adjustable as those others, despite coming to the market 20 years later. But now ARM can credibly join the conversation. For designers who think they want user-configurability – whether they really use it or not – Cortex-M33 can now make the short list. 

One thought on “ARM Dips Toe Into Configurability Pool”

  1. It’s like the guy who invented universal solvent — could not package it to ship or sell.
    What is really needed is certainly not at the level of diddling registers.
    The overhead of moving data between memory and registers is a big part of the problem.
    Since the data comes from somewhere outside memory, the sensible thing is to at least do some processing as it moves through the input path.
    Putting raw data into memory just so it can be read from memory for processing is dumb.

Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

Introduction to Bare Metal AVR Programming
Sponsored by Mouser Electronics and Microchip
Bare metal AVR programming is a great way to write code that is compact, efficient, and easy to maintain. In this episode of Chalk Talk, Ross Satchell from Microchip and I dig into the details of bare metal AVR programming. They take a closer look at the steps involved in this kind of programming, how bare metal compares with other embedded programming options and how you can get started using bare metal AVR programming in your next design.
Jan 25, 2023
37,934 views