feature article
Subscribe Now

ARM Floats Helium for Cortex-M

New Instruction-Set Extensions aid DSP, ML Performance

“I’m sorry, but neon just doesn’t look good on anybody!” — Tiffani Thiessen

First there was Neon, now there’s Helium. ARM has pulled the wraps off a package of DSP and machine-learning extensions for its low-end Cortex-M processors, sort of like Neon but not. Whereas Neon added DSP features to the Cortex-A family, Helium adds similar, but different, features to Cortex-M. So, Helium is lighter than Neon.

Atomic weights aside, Helium will add a substantial boost to the performance and capabilities of future Cortex-M processors. We won’t see any of those chips for quite a while, but compiler writers and RTOS vendors can get started now, if they like.

ARM won’t say when the first Helium-enabled core will be released or what it will be called. What ARM would say is that the first actual silicon containing Helium is about two years away, which puts the release of the first Helium-enabled core about 6–9 months out.

With chips so far away, and no new cores announced, why release the technical specifications for Helium now? Software. The changes are substantial enough that every RTOS, compiler, and middleware vendor will need to update their support before Helium hits the street. A two-year head start should be enough.

Helium will almost certainly be part of every future Cortex-M core from here on out, although it will be a user-selectable option. That is, ARM will include it in your next Cortex-M license, but you don’t have to enable it. Neon works the same way: it’s included, but implementation is optional.

Significantly, Helium is not compatible with Neon, at either the binary or the source-code level. Even though they implement similar DSP and vector features, they do so in different ways. You won’t be able to simply port code from a Cortex-A with Neon to a Cortex-M with Helium. You’ll need to rewrite, not just recompile.

This is a real departure for ARM, a company that has historically been a stickler for unquestioned family-wide compatibility. With just a few exceptions (mostly from the company’s younger days), all ARM processors have been software compatible with their siblings, ancestors, and offspring. Helium is the first truly new and incompatible extension in a long time.

The company is acutely aware of this and promises to provide readymade DSP and vector libraries to help customers jump the gap from Neon to Helium. Programmers who are already familiar with Neon will have a leg up on those who’ve never used ARM’s DSP extensions before, but it won’t be a cakewalk.  

Helium, like Neon, is a huge package of features intended to bring signal-processing and vector-processing capability to an otherwise generic ARM processor. As such, it includes more than 150 new instructions, along with new internal registers and new data types. The idea is to displace separate DSP cores from Ceva, DSP Group, and others, and to bring that technology on-chip, with a single processor core and a single instruction set. Not incidentally, it also keeps the customer’s licensing revenue within ARM.

The DSP features include some new low-overhead loop instructions that eliminate the first and last instructions in a typical control loop, so that the CPU spends more time looping and less time deciding whether it’s supposed to keep looping. These are routine for a “real” DSP but are new to the Cortex-M family. There’s even a “loop tail” instruction to handle those awkward cases where the number of data elements isn’t evenly divisible by two or four.

Scatter/gather addressing modes are also included, and these help the processor walk through memory with programmable strides, another common DSP feature. Branch hints (i.e., branch-probably and branch-probably-not) are another ISA tweak for accelerating DSP loops.

Helium also adds saturating arithmetic as well as signed and unsigned rounding modes. The number and variety of data types has also changed, with Helium supporting 8, 16, 32, and (with a little work) 128-bit fixed-point data. On the floating-point side, there are the usual single-precision (32-bit) and double-precision (64-bit) data types, with a new half-precision (16-bit) data type debuting with Helium. This last type is expected to be useful for voice-activation features, where audio fidelity isn’t important, but lag and throughput are. A Cortex-M with Helium can process twice as many samples of half-precision data compared to single-precision data, an important feature for a relatively slow and cheap microcontroller.

Vector operations can broadside two or four elements though the ALU at once, giving future Cortex-M devices some lightweight SIMD credentials. New lane-predication features allow the CPU to conditionally process some elements in a SIMD package while ignoring others, a time- and code-saving trick that even Neon doesn’t offer. To save space and energy, Helium reuses the Cortex-M’s normal FPU registers as its vector registers – just one of the reasons Neon code isn’t transportable to Helium.

Although Helium is the showpiece component of the new ARM v8.1-M architecture specification, there’s more to the spec than Helium. Also included are some debug tweaks, updates to the MPU (memory protection unit), changes to TrustZone, and RAS (reliability, availability, scalability) extensions. Interestingly, ARM also says the ARMv8.1-M specification has been “cleaned up, with regard to unpredictable cases.” Any new Helium-compatible processor core will come bundled with the other new v8.1-M enhancements, although not all v8.1-M cores will have Helium.

On one hand, Helium is a useful and obvious addition to the Cortex-M family. ARM saw that many of its licensees were strapping a DSP alongside their microcontroller and took the obvious step of offering an in-house equivalent. The company had already done most of the groundwork with Neon; it just needed a lighter-weight version for Cortex-M.

On the other hand, the fact that the Helium is so different from Neon means there’s little software compatibility between the two, making the upgrade/downgrade story a tougher sell. Previously, anyone using a low-end Cortex-M could easily upgrade to the bigger, faster Cortex-R or Cortex-A families without too much stress. Integer code was upgradeable. Even floating-point code was upgradeable. But now, the all-important DSP code is not upgradeable. And DSP code is notoriously difficult to rewrite, as it’s often time-sensitive and performance-critical. Any decent toolchain can recompile integer control code, but porting motion-control loops or DSP filters to a new processor is another matter. Helium may have the ARM brand name written on the side, but it’s unfamiliar hardware underneath.

On the third hand, programmers will learn Helium’s tricks and quirks soon enough, helped along by ARM’s promised software libraries and a bevy of third-party development tools. This is ARM we’re talking about, after all. Only the most popular CPU architecture in the world. DSP/ML/vector operations are important for “node” devices at the edges of our everyday networks, so adding Helium (or something very much like it) was inevitable. Now that it’s here – in specification form, anyway – we know what we have to work with.

3 thoughts on “ARM Floats Helium for Cortex-M”

  1. It’s interesting that the headline features of lane predication and using the vectorized loop to deal with odd-length loop tails are shared with SVE and the RISC-V Vector extension, making MVE more similar to those than it is to NEON.

    It’s also interesting that the fixed 128 byte vector register file of MVE (the FP register file repurposed) is identical to the minimum configuration of the RISC-V Vector extension, but RVV also covers all the ground covered by SVE (and more … SVE maxes out at 2048 bit vector registers), but using a single instruction set and programming model at all processor sizes and for both 32 bit and 64 bit processors.

  2. Complex processors are hard to take advantage of without good programming models, many excellent efforts have died due to lack of usability, I suspect this will be one of those. There’s also no indication that ARM know anything more about this market than anyone else, and it seems likely the open-source guys will come up with something as good.

    Xilinx are probably in a similar boat trying to use SystemC for their DSP/AI effort.

    Can’t teach old dogs new tricks, and ARM is a very old dog.

Leave a Reply

featured blogs
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 20, 2022
This week was the 11th Embedded Vision Summit. So that means the first one, back in 2011, was just a couple of years after what I regard as the watershed event in vision, the poster session (it... ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

Introducing new dynamic features for exterior automotive lights with DLP® technology

Sponsored by Texas Instruments

Exterior lighting, primarily used to illuminate ground areas near the vehicle door, can now be transformed into a projection system used for both vehicle communication and unique styling features. A small lighting module that utilizes automotive-grade digital micromirror devices, such as the DLP2021-Q1 or DLP3021-Q1, can display an endless number of patterns in any color imaginable as well as communicate warnings and alerts to drivers and other vehicles.

Click to read more

featured chalk talk

Double Density Cool Edge Next Generation Card Edge Interconnect

Sponsored by Mouser Electronics and Amphenol ICC

Nowhere is the need for the reduction of board space more important than in the realm of high-performance servers. One way we can reduce complexity and reduce overall board space in our server designs can be found in the connector solutions we choose. In this episode of Chalk Talk, Amelia Dalton chats with David Einhorn from Amphenol about how Amphenol double-density cool edge interconnects can not only reduce space but also lessen complexity and give us greater flexibility.

Click here for more information about Amphenol FCI Double Density Cool Edge 0.80mm Connectors