SoC Design Verification and Chip Debug with AI

Robert Ruiz, Taruna Reddy

Mar 22, 2023 / 5 min read

These days, the question is less about what AI can do and more about what it can’t do. From talk-of-the-town chatbots like ChatGPT to self-driving cars, AI is becoming pervasive in our everyday lives. Even industries where it was perhaps an unforeseen fit, like chip design, are benefiting from greater intelligence.

What if one of the most laborious, time-consuming steps in developing a chip could get a jolt of intelligence for faster first-time-right silicon? Imagine the possibilities of integrating AI into the chip verification and debug phase, especially as chips are only becoming more complex.

The end goal, of course, is to reach your verification coverage targets faster and, ultimately, find more bugs. A digital design has a vast number of design state spaces in which it can operate. And it’s virtually impossible to analyze all these spaces manually and come away with enough actionable insights to make a difference.

But if AI can step in and lend a hand, verification engineers can then focus on fixing the bugs found. Just think about how this can benefit your silicon designs.

Could Days-Long Regression Runs Be a Thing of the Past?

Chip design complexity is already growing by leaps and bounds, and the semiconductor industry is facing a slew of high-profile challenges. From the march to angstroms to multi-die integration and rapid node migration, there has never been a greater need to find innovative solutions while raising engineering productivity. Most SoCs, however, require a costly respin, largely due to logic and functional issues. Because of this, there can never be enough SoC verification…yet, cost and time-to-market pressures prohibit an endless verification and debug loop.

The verification process kicks off once the RTL for a chip design is set up and the design state space gets configured. Chip verification engineers need to check each of these spaces to ensure that the final SoC design will work. The goal behind coverage closure is to ensure that the entire design will work functionally as it is supposed to.

There are three main challenges for coverage closure:

  • Planning for coverage, as it is challenging to know what to write in the coverage definition for the testbench (what types of coverage groups are needed, where are the gaps, what still needs to be written, etc.). This is essential to ensure that 100% coverage indeed means you have found all the bugs.
  • Closing coverage, as it is difficult to know which tests contribute the most to coverage. You might run the same test 1,000 times only to achieve 50% coverage. As you get closer to 100% coverage, you might find that closing those last few percentages can take a few weeks. Targeted tests are key here, but these are very labor-intensive to develop.
  • Stimulus development and root-cause analysis, as you may encounter scenarios where the stimulus isn’t supposed to exercise a particular configuration or a bug. Perhaps the stimulus was written in a way that won’t hit the coverage target at all.

In a traditional chip verification cycle, verification engineers will set a target and run their regression environment. As part of the process, the engineers set up testbenches to generate random stimulus to see how the design responds. It’s not uncommon to have 10,000 to 15,000 tests for a given design, and the verification team usually doesn’t have a sense for the ROI of each test. Regressions could run for days, taking up valuable compute resources.

There are two iterative loops that take a bulk of the time in the SoC verification cycle: debugging failures and fixing bugs after running regressions and coverage closure (Figure 1). Both consist of time-consuming, iterative work, involving coverage analysis, making adjustments after discovering holes in the coverage, and doing it all again…and again…and again. Then, when teams discover failures, they need to analyze them, make changes in the RTL or the testbench, and re-run the regressions to ensure that the bugs were actually fixed. This part, too, is an iterative loop.

Also, it’s not uncommon for the last bit of the coverage closure process to be the most laborious. A thorough manual analysis of the huge amount of data that this whole process generates is not really feasible, so teams are generally left needing more insights into root causes of chip design bugs.

Figure 1: Iterative loops in a typical verification cycle.

Figure 1: Iterative loops in a typical verification cycle.

Learning the Way to Faster Verification Coverage Closure

One bright side of an iterative loop is the potential to learn from it—and this is where AI, and machine learning (ML) in particular, can play a key role. If an ML engine can learn from certain patterns, it would, for instance, be able to recognize what is likely an error from a line of code in the testbench. Knowing this, it would be able to apply this insight to future regressions, enabling faster coverage closure and, especially as the system gets trained, potentially higher levels of coverage.

AI is making big inroads in the semiconductor industry. The award-winning, industry-first Synopsys DSO.ai™AI application for chip design recently notched its first 100 production tape-outs. By automatically searching for optimization targets in a chip design’s large solution spaces, DSO.ai helps enhance engineering productivity along with the chip’s power, performance, and area (PPA).

On the verification side, to alleviate the debug and fixing cycle mentioned earlier, solutions like the Synopsys Verdi® Automated Debug System with Regression Debug Automation (RDA) technology provides AI-driven chip verification. With this capability, users can take advantage of predictive analysis that automates the manual and error-prone process of locating root causes of failures in the design-under-test and testbench. More innovations to automate the debug cycle are on the horizon, ultimately working towards a fully automated debug and fixing loop with no human intervention.

These examples are only the beginning of what is surely to come, as there are many more EDA processes in which greater intelligence can help engineers work more productively to generate better outcomes. For example, what if AI could provide a better sense of what additional coverage is needed? What if greater intelligence can minimize wasted time and energy in running regressions? Or what if it could help with faster root-cause analysis? Perhaps a task that would typically take days could be reduced to hours of time, potentially freeing up resources for additional projects and/or more value-added tasks.

Delivering the Right Chip to the Market Faster

The complex problems of our world demand more complex compute systems to tackle them. Automation and intelligence can complement the work that engineers bring to the table, raising productivity and enabling design and verification experts to focus on creating differentiated silicon chips to bring these systems to life. When AI-driven EDA flows can take on repetitive tasks, engineers gain the bandwidth to work on bug fixes and push their designs further. From design space exploration to coverage and debug loops and much more, the areas where AI can make an indelible impact are broad and vast.

Exploring AI's Potential in Verification and Debug Cycle at SNUG Silicon Valley Conference

At this year’s SNUG Silicon Valley Conference, AMD’s presentation on coverage-regression optimization in random simulations was selected as one of the Top 10 Best Presentations for the event. There was also a lunch panel discussion featuring industry experts from AMD, Meta, and NVIDIA, who explored “How AI Is Driving the Next Innovation Wave for EDA.” The recording of this session can be viewed on demand here.

Continue Reading