feature article
Subscribe Now

ARM Dips Toe Into Configurability Pool

Cortex-M33 Adds Support for User-Created Instructions… Sort Of.

“My ghast was flabbered.” – Anthony Grayling

User-configurable microprocessors are a subject near and dear to me, so I was excited to hear that ARM, a company renowned for its iron-fisted control over its CPU architecture, was loosening its grip and allowing users to create their own custom instructions. Could it be? Had the company really joined the ranks of the user-configurable army pioneered by Tensilica, ARC, RISC-V, and others? 

Well… no, not really. 

While ARM will, for the first time, allow its licensees to create their own instructions to extend the processor’s functionality, it’s a far cry from the fully user-configurable CPUs we’ve seen from other companies. In fact, it’s about the least you could do and still be able to check off the “user-configurable” box on a developer’s wish list. 

Still, it’s a step in the right direction and an indication that, “ARM is changing,” in the words of Thomas Ensergueix, the company’s Senior Director for Embedded, Automotive, and IoT Business. The company wants to see its nearly ubiquitous processors make inroads into edge devices that often have unusual, domain-specific tasks to accomplish. Sensor aggregation and motor control, for instance, aren’t well suited to generic CPU architectures. Customers often wind up using a DSP or an FPGA, or they design custom hardware – including user-defined CPUs. ARM wants to elbow into that business. 

ARM has supported coprocessors for a long time, but that’s not the same thing as new instructions. A coprocessor like an FPU or a DSP (also from ARM) operates alongside and somewhat independent of the main Cortex CPU. Coprocessors have the advantage of being able to run in parallel with the main CPU so that long-latency instructions (a floating-point divide, for example) don’t hang up the main processing pipeline. But they also have the disadvantage of operating at arm’s length, so to speak. Operands must be explicitly moved into and out of the coprocessor over the CPU bus, which takes time and burns energy. They’re like big peripherals that consume data and instructions. 

In contrast to all that, new user-created instructions will be part of the processor core and execute as part of its CPU instruction stream. New instructions have direct access to the CPU’s register file and don’t need to be spoon-fed operands like coprocessors do. This should permit faster and more efficient instructions that also use less energy. New instructions can be arbitrarily complex. Sounds like a win. 

But there are several limitations. 

For one, this newfound freedom applies only to the Cortex-M33. No other ARM processors support this new feature, nor will it be retroactively added to any of them. Even Cortex-M33 doesn’t really support it yet; that upgrade comes next year. Looking ahead, future Cortex-M designs will support user-defined instructions, but only Cortex-M designs. Although future Cortex-A or R-series processors might get this ability, I wouldn’t bet on it. 

Second, your user-defined extensions can’t access memory. Or control registers. They can access the processor’s main data registers (r0 through r12), which means you can do arithmetic or logic operations on register data, but not much else. You could implement your own bit-twiddle instructions, but only if they don’t need to touch memory. You can’t do flow control (no if/then/else), nor can you tweak the processor’s control registers or secure areas. Massage register data all you want, but ARM draws the line there. 

On the plus side, whatever you do create is your property; ARM has no rights to the IP, and you could theoretically sell your handiwork to others if you want to. ARM hopes, somewhat optimistically, that third-party design houses might do a nice side business hot rodding Cortex-M processors with various custom ISA upgrades. That’s a fun idea, although history suggests it won’t happen. 

You’re also on your own for software development. None of ARM’s programming tools support user-defined instructions – obviously – nor do any third-party tools. Any instructions you create will have to be hand-coded using assembler or intrinsics. Same goes for debugging: nobody’s debugger will understand what your new FUBAR operation does or how it should behave, and most will treat it as either an illegal operation or as a coprocessor operation. (User-defined instructions overlay coprocessor functions in the opcode map.) Planned updates will allow the tools to tolerate user-created instructions, but they still won’t understand them. 

How many Cortex-M33 licensees will start creating their own instructions? Probably not very many. That’s not a slam against ARM’s implementation; it’s just the the reality of user-configurable processors in general. Everybody likes the idea of souping up their processor and adding their own secret sauce, but few developers actually do it. Paradoxically, even the processors that are known for being user-configurable, such as ARC (from Synopsys) and Tensilica (from Cadence), are overwhelmingly used in their default “factory configuration.” Like an SUV that’s never taken off-road, it sounds like a great idea right up until it’s time to leave the beaten path. 

Which is a shame, because customizable processors have real, tangible advantages. It’s not all marketing glitter. A motor-control loop can benefit hugely from just one or two custom instructions (sine or cosine, for example) that aren’t part of the standard Cortex-M instruction set. Cryptography applications, sensor data acquisition, network filtering, and dozens of other obscure corners of the embedded world can all benefit from processors that are tweaked to handle their unique data types or oddball arithmetic operations. Tenfold performance improvements are not unheard of. Just eliminating the back-and-forth latency of the coprocessor interface can make a difference. 

And yet… ARM’s move feels like it’s only a very small step in that direction. On one hand, it’s a big philosophical change for a company that has always avoided branches in the family tree, mostly to ensure that (almost) all ARM processors are (mostly) binary compatible with one another. It’s a guiding principle that’s served the company well. Diverting from that path is kind of a big deal. 

There’s also the substantial engineering work that went into this thing. Designing a CPU that can tolerate tacked-on third-party hardware is no easy feat. ARM’s cores are pretty tightly optimized, but there’s no telling how sloppy a customer’s additions might be. Plus, ARM will take on a whole new support burden fielding calls from DIY processor architects tampering with its products. 

But this also feels like a bit of PR handwaving, a move intended to blunt the appeal of RISC-V, Tensilica, ARC, and other modular or customizable processors. Cortex-M33 is nowhere near as adjustable as those others, despite coming to the market 20 years later. But now ARM can credibly join the conversation. For designers who think they want user-configurability – whether they really use it or not – Cortex-M33 can now make the short list. 

One thought on “ARM Dips Toe Into Configurability Pool”

  1. It’s like the guy who invented universal solvent — could not package it to ship or sell.
    What is really needed is certainly not at the level of diddling registers.
    The overhead of moving data between memory and registers is a big part of the problem.
    Since the data comes from somewhere outside memory, the sensible thing is to at least do some processing as it moves through the input path.
    Putting raw data into memory just so it can be read from memory for processing is dumb.

Leave a Reply

featured blogs
Jul 20, 2024
If you are looking for great technology-related reads, here are some offerings that I cannot recommend highly enough....

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

Sponsored by Cadence Design Systems

Deploying data centers with AI high-density workloads and ensuring they are capable for anticipated power trends requires insight. Creating a digital twin using the Cadence Reality Digital Twin Platform helped plan the deployment of current workloads and future-proof the investment. Learn about the collaboration between NV5, NVIDIA, and Cadence to optimize data center efficiency, performance, and reliability. 

Click here for more information about Cadence Data Center Solutions

featured chalk talk

Electromagnetic Compatibility (EMC) Gasket Design Considerations
Electromagnetic interference can cause a variety of costly issues and can be avoided with a robust EMI shielding solution. In this episode of Chalk Talk, Amelia Dalton chats with Sam Robinson from TE Connectivity about the role that EMC gaskets play in EMI shielding, how compression can affect EMI shielding, and how TE Connectivity can help you solve your EMI shielding needs in your next design.
Aug 30, 2023
38,613 views