When Reliability Analysis Meets Silicon Lifecycle Management

I’ve recently been chatting with folks from Synopsys and Concertio, and now my head is so full of “stuff” regarding things like hyper-convergent chip design, reliability analysis, real-time performance optimization, and silicon lifecycle management that I don’t know whether I’m coming or going.

I’ve told this tale before (and I’ll doubtless tell it again), but when I worked on my first digital ASIC design circa 1980 at International Computers Limited (ICL) in Manchester, England, the only design tools I had at my disposal were pencils, paper, data books, and my poor old noggin. Of course, the ASICs in question were implemented at the 5-micron technology node and boasted only a couple of hundred equivalent logic gates, but — still and all — they could be tricky little rascals if you didn’t keep a beady eye on them.

Functional verification involved the rest of the team looking at your gate- and register-level schematics and asking pointed questions like “What’s this bit do” and “Why did you do it that way?” Once you’d answered these questions to their satisfaction, you moved on to timing verification. This involved your identifying any critical paths by eye, and then calculating and summing the delays associated with those paths by hand (no one I knew owned so much as a humble 4-function electronic calculator).

Later, when I moved to a company called Cirrus Designs, and then on to its sister company called Cirrus Computers, both of which were eventually acquired by the American test equipment company GenRad, I got to work with HILO 2, which was one of the early digital logic simulators. In fact, HILO-2 was the first true register transfer-level (RTL) simulator, although the term “RTL” had not yet been coined in those days of yore. This little scamp was extremely sophisticated for its time because it had three aspects to it: a logic simulator that could run using minimum (min), typical (typ), and maximum (max) delays; a fault simulator that could also run using min, typ, or max delays; and a dynamic timing simulator that could run using min-typ, typ-max, or min-max delay pairs.

I was particularly impressed with the dynamic timing simulator. Where the regular simulators would transition from 0 to 1 to 0 using the selected delay mode, the dynamic timing simulator would transition from 0 to “gone high don’t know when” to 1 to “gone low don’t know when” to 0, where the “don’t know when” states encompassed the relevant delay pair.

Although all of this took place only 30 to 40 years ago as I pen these words, I could never have envisaged the sorts of tools available to the design engineers of today. Of course, today’s design engineers need awesomely powerful and incredibly sophisticated tools because they are working with hyper-convergent designs and design flows. By hyper-convergent design, we are talking about things like a single die featuring a diverse set of analog, digital, and mixed-signal components, or a single package containing multiple dice that are potentially implemented on different process nodes. All of this involves larger and more complex circuits running at higher frequencies, the reduced margins and increased parasitics associated with advanced process nodes, and the need for faster, higher capacity accurate simulators.

Meanwhile, a hyper-convergent design flow is based on a common data model that is shared by all of the tools in the flow. Having a common data model facilitates the sharing of information between different phases of the design, thereby reducing design iterations and time-to-market (TTM).

Reliability Analysis

Do you recall my recent column here on EE Journal regarding the book A Hands-On Guide to Designing Embedded Systems by Adam Taylor, Dan Binnun, and Saket Srivastava? If so, you may remember my saying: “Since Adam is one of the few people I know to successfully (and intentionally) have his FPGA-based designs launched into space, there’s a huge amount of material on the topic of reliability, including how to perform worst-case analysis and how to evaluate the reliability of the system.”

The reason I mention this here is that one of the people I’ve been video-chatting with is Anand Thiruvengadam, who is director of product management at Synopsys. Anand’s mission was to heighten my understanding of Synopsys’s new PrimeSim Reliability Analysis solution.

Anand started by telling me that PrimeSim Reliability Analysis features a unified workflow of proven, foundry-certified reliability analysis technologies, it provides faster time-to-reliability compliance for mission critical applications, it offers full lifecycle coverage that is compliant with standards such as ISO 26262, and it’s tightly integrated with PrimeSim Continuum and the PrimeWave Design Environment. I tried to maintain what I hoped appeared to be a knowledgeable expression on my face, but I fear Anand was not fooled, so he quickly showed me the following diagram:

Primetime Reliability Analysis offers a unified workflow of proven technologies for full lifecycle reliability verification (Image source: Synopsys).

Ah, now everything is clear. PrimeSim Circuit Check (CCK) offers programmable static analog and digital circuit checks with full chip verification in a matter of minutes, PrimeSim Custom Fault improves test coverage, reduces defect escapes, verifies safety with ISO 26262 compliance, and accelerates silicon failure analysis. PrimeSim AVA provides fast design marginality analysis using machine learning (ML) running on the fly to capture 100X to 1,000X fewer samples while offering results whose accuracy is within 1% of PrimeSim HSPICE. PrimeSim SPRES provides signoff power/ground integrity analysis that augments EM and IR analysis, that can be deployed both early in the design cycle and later for signoff, and that provides fast static analysis, handling 1M+ element networks in minutes. Signoff EMIR analysis, which is used to ensure electro-thermal reliability, is provided by the combination of StarRC (high-capacity power and ground optimization), PrimeSim EMIR (high-performance, foundry-certified EMIR analysis), and Custom Compiler (advanced “what-if” analysis and debug). Last, but certainly not least, PrimeSim MOSRA means we can ensure long operating lifetimes with high-performance device aging analysis. Phew!

Silicon Lifecycle Management

Just when I thought things couldn’t get any more interesting… they did. This is the point where we switch gears slightly to consider the topic of the Synopsys integrated Silicon Lifecycle Management (SLM) platform called SiliconMAX, which helps us to improve silicon operational metrics at every phase of a device’s lifecycle through the intelligent analysis of ongoing silicon measurement. (The reason I said “switch gears slightly” is that Reliability Analysis and Silicon Lifecycle Management go hand-in-hand.)

The general SLM approach can be summarized as shown below. We start with a variety of monitors and sensors that are intelligently embedded throughout each chip, and we use these little ragamuffins to generate a rich data set that feeds analytical engines that enable optimizations at every stage in each device’s lifecycle — in-design (using sensor-based silicon aware optimization), in-ramp (using product ramp and accurate failure analysis), in-production (using volume test and quality management), and in-field (using predictive maintenance and optimized performance).

General SLM approach (Image source: Synopsys)

In June 2020, Synopsys announced it had acquired Qualtera, a fast-growing provider of collaborative high-performance, big data analytics for semiconductor test and manufacturing. In November 2020, it announced that it had acquired Moortec, a leading provider of in-chip monitoring technology specializing in process, voltage and temperature (PVT) sensors.

As an aside, by some strange quirk of fate, I wrote a column on Moortec just a few months before the Synopsys acquisition took place (see Distributed On-Chip Temperature Sensors Improve Performance and Reliability). This isn’t the first time this has happened to me. In one case, a startup company for whom I created the web content was acquired before they’d had the time to pay me (the new parent company paid me later). Of course, I’m not saying that my columns are so powerfully presented that any company I write about is certain to be acquired (but I’m not not saying it, either).

All of this brings us to a company called Concertio, which developed an innovative AI-powered performance optimization tool whereby an SLM agent continuously monitors the interactions between operating applications in the field and the underlying system environment.

Well, wouldn’t you just know it? On 1 November 2021, Synopsys announced that it was enriching its SLM solution with real-time, in-field optimization technologies by acquiring Concertio. Happily, I got to chat with Steve Pateras, the Senior Director of Marketing for Test Products in the Design Group at Synopsys.

What we are talking about here is adding instrumentation without affecting performance or the design flow. In addition to things like temperature and voltage, this instrumentation will even be able to measure path delays in silicon chips. The resulting data is amassed not only “now,” but over time. Using this information, the agent’s optimization engine can adapt and reconfigure the system dynamically — modifying application settings, operating system settings, and network settings — resulting in a self-tuning system that’s always optimized for its current usage. These agents can be deployed in tiny devices, capable devices, and powerful devices to provide dynamic optimization-as-a-service at any scale.

With the addition of Concertio, Synopsys’s SLM capabilities now extend beyond design and test and to the in-field optimization realm, providing a way to perform real-time analysis and optimization of software running on systems, including the high-performance processors powering data centers or the compute engines in automotive applications, for example.

I don’t know about you, but my head is currently buzzing with ideas. Using the technologies discussed in this column, we can now design ultra-reliable systems and then monitor and fine-tune their performance in the field. We truly do live in an age of wonders. What say you? What are your thoughts on all of this?

When Reliability Analysis Meets Silicon Lifecycle Management

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk