feature article
Subscribe Now

ARM Cortex-A76AE Reliably Stays in Lock Step

Processor for Autonomous Vehicles Learns a Few Safety Tricks

“A man with a watch always knows what time it is. A man with two is never certain.” – Segal’s Law

Another day, another new ARM processor. These guys are like Taco Bell. They keep rearranging the same three or four ingredients to create a surprising assortment of different products. A pipeline here, a new cache there, and you’ve got a whole menu of options. There’s something for everyone, even if it does all taste the same.

One new ingredient in ARM’s kitchen, though, is a reliability feature called split-lock. It’s aimed at autonomous vehicles (self-driving cars) but it could also be relevant to robotics, aerospace, and other high-reliability applications. It’s not an entirely new concept, but it does add a bit of spice to an otherwise familiar bill of fare.  

You can’t tell much by just looking at the menu, so I ducked behind the counter and talked directly to the cooks in the kitchen. Specifically, I spent some quality time with members of ARM’s CPU design team, who were candid, friendly, and helpful. It was a refreshing change from the usual corporate briefing and death by PowerPoint.

The processor in question is the Cortex-A76AE, and it’s all-new… but it’s not. As the name suggests, it’s based on the Cortex-A76, which we covered during its announcement in June. The -A76 is ARM’s current top of the line: a superscalar, out-of-order, 64-bit beast with a projected 3.0-GHz maximum clock frequency. The new -A76AE is all of that plus “AE,” which stands for “automotive enhancements.” Despite the name, the enhancements aren’t really specific to cars, although that’s clearly the sexiest market niche for now. Who would want to hear about a processor for industrial automation?

There are a number of goodies rolled up into that AE moniker, some more subtle than others. For example, ARM provides licensees of the -A76AE with documentation and testing info that will help with their eventual ISO 26262 certification down the road. You can’t usually certify IP itself, but, if you’re clever, you can remove roadblocks to your customers’ ultimate certification once they put it in an SoC. Imagination Technologies did a similar thing a year ago with its MIPS I6500-F processor.

ARM’s design team also certified themselves, in a way, as they were designing the -A76AE. As with ISO 9001 certification, you can show that your internal processes are documented, understood, and adhered to, and that helps both you and your customers clear certain regulatory hurdles. ARM was looking ahead with the -A76AE, knowing that its customers would be facing a regulatory decathlon.

The most interesting hardware component of the AE, though, is the split-lock feature. This allows two CPU cores to run in lockstep, each one double-checking the other on a cycle-by-cycle basis. If one CPU somehow disagrees with the other, the -A76AE flags a lockstep fault. The idea is to run safety-critical code on two CPUs in lockstep for maximum reliability. One CPU core might conceivably suffer a random failure, but two CPUs failing in the same way at the same time is statistically unlikely.

Neither CPU in this twinning arrangement is master or slave. They both run the same code at the same speed, and both are treated equally. Most of the time (in fact, all the time under normal circumstances), both CPUs will produce identical results. Their behavior will be indistinguishable. That’s what you want.

However, running in lockstep does obviously mean that you’re giving up half of your processing resources. A dual-core -A76AE running in lockstep is only as fast as a single-core version, and a four-core configuration will perform like a dual-core system – but with the added insurance of backup hardware.

That’s your call, and enough high-reliability customers are interested in that configuration to make it worth ARM’s time to produce a special version of its top-flight CPU for just that purpose.

You don’t have to run the -A76AE in lockstep if you don’t want to. It’s an optional feature, and if it’s disabled, you essentially have a standard Cortex-A76 with better documentation. Lockstep has to be enabled/disabled at reset, so it’s not something you’d switch on or off on the fly. Which makes sense. This isn’t like toggling between 16/32-bit modes on an x86 processor. You either really want to use lockstep or you don’t.

If you don’t, that’s the “split” part of ARM’s split-lock feature description. Split mode is simply what we used to call normal operation. Like “World War I,” it’s a retronym; a term coined only after a new word renders the old one obsolete. Running in split mode, as opposed to lock mode, means that all the CPU cores in your -A76AE implementation run independently, like normal. There’s almost no difference between an -A76AE running in split (normal) mode and a standard -A76.

The “almost” prompts a small footnote. There is obviously some additional hardware within the -A76AE to implement the lock-mode safety checking, and, like most gratuitous hardware, it exacts a small performance penalty. The design team told me that an -A76AE runs about 5% slower than a standard -A76, all things being equal. Part of that minor performance hit comes from a slightly slower maximum clock frequency, and part from some slight differences in IPC (instructions per clock cycle). Running the processor in split mode (i.e., with lockstep disabled) claws back a small portion of that difference – perhaps a few percent – but not all of it. Thus, an -A76AE can never be as fast as a standard -A76. But if you’re really worried about that last 5% of theoretical maximum performance, you’re probably using the wrong processor or writing really bad code.

Twinning processors for reliability isn’t an entirely new concept. It’s not even new to ARM, having first appeared in the Cortex-R8. NASA’s Space Shuttle ganged six computers together, with four cross-checking one another, a fifth for backup, and one cold spare. Big fault-tolerant systems have used similar techniques for decades. Plenty of designers have also crafted their own solutions, usually with standalone processors and external hardware with lots of registers and comparators. ARM has made the process a whole lot easier. Just toggle a few configuration bits at bootup and stand back.

How does it work? An extreme approach might have been to scatter comparators all throughout the processor’s pipeline, checking every register, data bus signal, and address line. That’s overkill, according to the designers I spoke with, and it’s unnecessary. There’s no need to burden the entire microarchitecture. All you really care about is what happens outside the CPU core. It’s enough to compare “external” bus signals (external to the CPU core; internal to the SoC). The -A76AE monitors the interfaces between the CPU core, its caches, and its coherent buses. If it detects anything untoward, it raises a lockstep fault. What you do from that point is up to you.

What the -A76AE doesn’t require is any extra software. There’s no software component at all to the lockstep feature; it’s all done in hardware. The only code you’ll need to write is the handler in case of a fault. Apart from a few configuration bits, the entire process is software-invisible.

Like all recent ARM processors, the -A76AE is designed to be used in either homogeneous processor groupings (i.e., all CPU cores the same) or heterogeneous arrangements, preferably a “big.little” pairing with Cortex-A55. Additionally, you can choose to implement one, two, or four CPU cores per “cluster,” which is a specific ARM-defined grouping. You can have as many clusters as you want within your SoC. For redundancy to work, you’d obviously want an even number of CPUs per cluster (two or four). Only CPUs within a cluster can be paired for lockstep operation; you can’t pair a CPU from one cluster with one from another cluster. Also, with a four-core cluster, you must enable lockstep for both pairs. In other words, lockstep is a cluster-level feature. You can’t enable it for half the cluster but not the other half.

The changes between the standard Cortex-A76 and the -A76AE seem minor, but they were planned far in advance. The lockstep feature wasn’t just grafted on at the last minute, and the certification requirements had to be in place early on, before work on the CPU core even began. So, even though the standard -A76 doesn’t make use of any of that extra work, its design was informed by the requirements of its younger twin. ARM’s design team had to avoid shooting themselves in the foot with the -A76 so they’d be in a good position when it came time to certify the -A76AE. “The overall design process would’ve looked substantially different,” had they not planned for redundancy from the outset, one engineer told me.

It’s gratifying to know that CPU cycles – and silicon transistors in general – are so plentiful that we can casually throw half of them away by duplicating an entire CPU and its caches. There was a time when a multimillion-transistor processor was an incredible (and incredibly expensive) achievement. Now we double them up and fold them over. At the same time, it’s nice to know that future autonomous vehicles and other high-reliability systems will have that kind of redundancy built in. Hey, Cortex-A76AE, looks like you’ve got my back.

Leave a Reply

featured blogs
Dec 5, 2023
Generative AI has become a buzzword in 2023 with the explosive proliferation of ChatGPT and large language models (LLMs). This brought about a debate about which is trained on the largest number of parameters. It also expanded awareness of the broader training of models for s...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

Energy Storage Systems
Increasing electric vehicle sales, decreasing battery sales, and a shift in energy consumption has made energy storage systems more important than ever before. In this episode of Chalk Talk, Amelia Dalton chats with Gijs Werner from Amphenol FCI Basics about the functions and components involved in commercial energy storage systems, residential energy storage systems and EV charging stations. They investigate the qualifications needed for connectors in energy storage systems and what kind of connectors Amphenol FCI Basics offers for your next energy storage system design.
Apr 3, 2023
29,170 views