feature article
Subscribe Now

Two Cores When One Won’t Do

Synopsys Announces Dual-Core Module for ASIL-D

Do you trust your processor?

Yeah, you’re right; that’s not a fair question. If the question is reworded as, “Will your processor always give the correct result?” then the obvious comeback is, “Correct according to what?” If there’s a bug in the software, then the processor will give the correct – but not the desired – result.

So let’s assume good software. Now will the processor always give the correct – and desired – response?

Well, what if there’s a bug in the hardware? Of course, many of you reading this may well be deep in the throes of making sure that’s not going to be the case on your processor. As with software, it’s hard to guarantee that hardware has zero bugs. But, unlike software, great gobs of money and effort are expended on taking the bug count asymptotically close to zero.

So if we assume a good job has been done on verifying the processor, then can we (please) trust our processor?

Yes. Well… maybe. What are you doing with it? If you’re liking a dog picture on social media, then sure. But if my life depends on it? If it runs my car and is the main thing determining whether or not I become a roadside casualty, then… maybe not so much.

Even if the processor design team had truly discovered and resolved all issues, some of those issues aren’t binary. In particular, performance issues are verified to some level of confidence. It’s never 100%. Yeah, you can embrace 6?, but what if some unlikely condition out at 7? occurs?

Then there are uncontrollables like alpha particles. Or silicon wear-out, or walking-wounded issues that manifest later. Some of these may be temporary, some fatal.

So the most tech-naïve of us knows that we can’t count 100% on simple technology all the time, and we make allowances when that web page doesn’t come up the first time or when our call drops.

Running the Critical Parts of the Car

There’s an old set of jokes about what it would be like if cars were run by Windows. Things like, every now and then having to pull over to the side of the road, shut the engine down, and restart it – for no particular reason. That’s all funny – until you realize that upcoming self-driving cars are going to feature technology, nominally of the same sort that occasionally features a blue screen (whether or not branded by Microsoft).

So if we can’t 100% guarantee outcomes for so-called safety-critical operations – circuits in planes and trains and automobiles and medical devices and nuclear power plants – then how can we trust that those circuits won’t be our undoing?

In the automotive world, the ISO standard 26262 lays out expectations for different sorts of functions according to how likely they are to happen, how much control the driver has, and what the consequences of failure would be. These are given ASIL ratings: A (of least concern) to D (stuff better work or people could die).

So, out at that ASIL-D level, what do you do?

This concern has long been a factor in the mil/aero industries, where planes need to stay aloft and munitions must not deviate from their trajectories. One of the solutions there is referred to as “triple-module redundancy” (TMR). This idea, oversimplified, makes the assumption that, by tripling up the computing at critical nodes, if one processor has an issue (low probability if designed well), then the other two are even less likely to have the same issue. So in the event that all three processors don’t agree, a two-out-of-three vote settles the argument. Democracy in action!

This works – for a price. In that market, prices are indeed higher to support this extra cost burden (and many others). The same can’t be said, however, for the automotive market. Lives are still at stake, but shaving costs is critical. In this case, there’s a different way of handling processor failure. It still involves redundancy, but less than TMR.

The automotive approach is to use two instead of three processors. And, instead of three processors without hierarchy, the dual-core approach has a main processor and a shadow processor that acts as a double-check. Synopsys has announced a dual-core module targeting ASIL-D applications, referring to their instances in a circuit as “safety islands.”

 Diagram_Synopsys_ASIL_D_Ready_Dual-Core_Lockstep_Processor_IP_FINAL.JPG

(Image courtesy Synopsys)

The idea here is that the main core has primacy, but it’s got this shadow core looking over its shoulder. If the shadow doesn’t agree with a result that the main core produces, it alerts. What happens then depends on the application; think of it as throwing an exception, and the code has to determine the error handler. Except that, this being hardware, there are several options for manifesting a (hopefully) graceful exit from the state of grace.

When such a disagreement occurs, a two-bit error signal is set – and remains set until specifically reset. The state of the cores is also frozen for forensic or debug purposes. For recovery, you get three options: reset the core; interrupt the core; or send a message to a host processor. Synopsys sees the first two as most likely, since trust in the main core is now compromised (even though it’s theoretically possible that it could be the shadow core that glitched).

Simple in Principle, But…

So far, so good. But… what happens if some event occurs – a power glitch, an alpha particle, whatever – that affects both processors? As circuits get smaller, even localized events start to affect more circuitry at the same time. If that happens, the main core might generate an incorrect result – and the supervisor, still reeling from the same event, might go along with it. Not a good thing at 70 mph.

So the module includes a notion called “time diversity” – the shadow core does what the main core does, only one or two clock cycles later. (The specific number of cycles is programmable.) This makes it much less likely that something affecting the main core will affect the shadow core equally.

This is done with a FIFO in the safety monitor; the main core’s inputs and result are pushed into the FIFO so that it can be compared at a (slightly) later time with the shadow core’s outcome. This comparison is done for each clock cycle.

Which raises a new question: what is a “result”? Some instructions take more than one cycle to complete; what’s the intermediate result? Some instructions perform a calculation, in which case there is a specific result. But others might store data into memory – what exactly is the result there? Do you then go test whether the data truly ended up in memory? Does the shadow core do a test-and-store if the to-be-stored values disagree?

There are a couple of pieces to the answers. First, you can’t have results with definitions that vary according to the application; that’s just crazy-making. Instead, there’s some subset of the internal state that gets compared. That then works for each clock cycle, regardless of the specific instruction.

The other piece is that the shadow core can read from memory, but it can’t write to it. It’s not there to “do” anything; it simply supervises, tattling when there’s an issue.

Synopsys says that dual-core processors aren’t a new thing, but most are higher performance. They say that their ARC-based dual-core module – intended specifically for ASIL-D usage – is the first one in the microcontroller range.

All of this effort so that, when you’re cruising down the coast, hair blowing all over, magical tunes blaring from your speakers, and your car doing all the work automatically, you won’t have to think about your processors. You’ll just trust them.

More info:

Synopsys ARC Safety-Island IP

One thought on “Two Cores When One Won’t Do”

Leave a Reply

featured blogs
Apr 24, 2024
Learn about maskless electron beam lithography and see how Multibeam's industry-first e-beam semiconductor lithography system leverages Synopsys software.The post Synopsys and Multibeam Accelerate Innovation with First Production-Ready E-Beam Lithography System appeared fir...
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Littelfuse Protection IC (eFuse)
If you are working on an industrial, consumer, or telecom design, protection ICs can offer a variety of valuable benefits including reverse current protection, over temperature protection, short circuit protection, and a whole lot more. In this episode of Chalk Talk, Amelia Dalton and Pete Pytlik from Littelfuse explore the key features of protection ICs, how protection ICs compare to conventional discrete component solutions, and how you can take advantage of Littelfuse protection ICs in your next design.
May 8, 2023
41,599 views