feature article
Subscribe Now

Using a Hierarchical Approach to Tame SoC Debug Anarchy

Imagine you’re a private detective and someone hands you a picture. In that picture you see a slim young man in a bathing suit holding a surfboard. The board is plain, marked only with some sort of symbol like a sloppy A within a circle feature. He has dark hair cut into a mohawk, peaked with gel, tinted red at the ends. He’s got a reasonable tan working. Your job is to find this guy. Somewhere in the US. (Let’s assume for the sake of keeping it simple that the guy will be staying put – you won’t be chasing a moving target.)

Oh, did we mention that you are completely unfamiliar with the US? In other words, you have no cultural pre-conceptions about where to find anyone. So, using only a brute-force approach, you would have you start, say, in Maine, and perhaps plot a boustrophedon path to San Diego. Which would take a very long time.

You could simplify your task dramatically if you took a couple of steps to narrow down your search. You could get yourself a guide to regional climates as well as a guide to local culture. So, for example, the tan indicates ample sunshine: the climate guide would help narrow the region to more southerly areas. The cultural guide could then add a couple more refinements: the surfboard would indicate coastal, eliminating the desert southwest. And the tinted mohawk (and the surfboard symbol) would suggest West Coast, deprioritizing the East and Gulf Coasts.

From this information, you could focus your search on Southern California (moving as high as Seattle just in case the picture was taken in the summer after a rare sunny streak).

You have just applied a hierarchical search to narrow down the region of concern, vastly increasing your chances of finding this guy before his time in this mortal realm expires. The climate guide took you part of the way, and then the cultural guide zoomed you in two steps further. At that point, you were back to the basic footwork required to find him.

Now imagine that you’re debugging an SoC design. You’ve spent months validating your RTL, and now you’re testing the entire system by running software on the hardware. The very fact that you’re debugging means that something’s not working right.

And the problem you have is much harder than finding one in 300 million people. Somewhere within a billion clock cycles, something went awry. You have to find that clock cycle so that you can figure out whether it’s a hardware problem or a software problem. If you don’t find the problem now, you may find it after you have silicon – which may mean a product launch delay to fix software. Or new masks and more delay if it’s a hardware problem. Or, worse yet, your customer may find the problem in the system you ship them.

So you’ve got to resolve any issues now, before silicon. Which is why you use an emulator to run actual application software against the design to figure out where things go wrong before they end up in silicon.

The good news about testing your software ahead of time is that, with the aid of an emulator, you can run billions and billions worth of clock cycles of activity in a reasonable time frame. The bad news is that, if there is a problem, you have to wade through those billions to find the information. And there’s no reasonable way to ox-plow your way through that at the waveform level even if you wanted to. So you need a hierarchical approach. To this end, you have three levels of abstraction to play with: software, transaction, and logic. Each of them can narrow your search range by many orders of magnitude.

Step 1: Software debug

The first step is accessible to anyone who’s dabbled with software debugging – which pretty much means anyone with a computer science degree. It’s the simple matter of using a software debugger to exercise the code and find the range of instructions where the problem seems to be occurring.

While this may sound easy enough, modern SoCs are typically complex heterogeneous multicore beasts, with modes that come and go and processors each doing their own thing and occasionally interacting with each other. Such an arrangement can be much more difficult to debug than the simple sequential processor you may have cut your teeth on.

If it turns out, for instance, that there is a problem between two cores, each of which is running its own OS instance, then you may have to run multiple instances of your debugger, tracking what happens on each core to see where the problem is. Keeping an entire such system in lock-step can be a challenge – if each of your debugger instances knows nothing of the other, then you can’t single-step them together, for instance. Whether or not that matters is a high-level consideration that is rightly considered at the software level of abstraction, and would be tough to think through by diving down to the circuit level straightaway.

If the failure lies in a fundamental architectural problem that would be reflected in a virtual model, then it can be seen simply by running the software in the host on the virtual model and noting where it fails. But much more likely is that there is something more subtle going on in the actual implementation, in which case the idealized model would work fine. So the offending module must be exercised in the emulator, with other software running on the host to boost performance. Exactly which module contains the bug may be suggested by the original failure, or some trial and error may be needed to isolate the offending portion of the system.

The very fact of this complexity makes it that much more important to narrow the range of interest at the software level first. It gets your field of view down from billions of clock cycles to around a million – three or so orders of magnitude improvement.

Step 2: Assertions, checkers, and monitors

From here, you drop down a level from the software realm into the hardware realm. But you still work at the top level of hardware abstraction: TLM. Here you use assertions, checkers, and monitors to further isolate the region of failure.

Assertion-based verification has been slow in gaining mainstream usage due to the fact that a complete set of assertions for verifying the entire design can be difficult to set up. But, whether or not they’re used for verification, they can be extremely useful for debug, where you’re defining a limited set in order to figure out what’s misbehaving.

These three terms, “assertions,” “checkers,” and “monitors,” often accompany each other without careful distinction as to what they mean individually. If you scour the internet, you can, for example, find references where “checkers” and “monitors” are said to be synonymous with “assertions.”

EVE does not use these terms in a way that has them all meaning the same thing, although they’re related. Checkers and monitors are both used to verify, or, in the case of debug, to examine signals. The key distinction is that, while a monitor simply observes a set of signals for later perusal or analysis, making no judgments as to correctness, a checker contains its own means of determining whether the behavior is correct.

One way of implementing a checker is by using assertions. An assertion is a piece of SystemVerilog code that expresses design intent. So, in the ideal case, a thorough developer crafts two chunks of code for each bit of behavior: the description of the circuitry, coupled with assertions describing the basic operating assumptions that should not be violated. For debug purposes, assertions can be used both to confirm correct behavior and to test failure hypotheses.

It’s not important to get hung up on the terminology, but it is important to know that these tools are there and can help pin down the location of failing circuitry much more efficiently because they can be executed via transactions.

By working at the transaction level, you can see which high-level operations aren’t working properly without worrying about the intricacies of the signals. The SCE-MI interface between the emulator and the host can pass transactions back and forth quickly and efficiently. This narrows your view down to about 10,000 cycles – another couple orders of magnitude improvement.

Step 3: Logic debug

At this point, you can start looking at actual signals. You have a lot of clues in the form of failed assertions and incorrect signals from your transaction-level work; now you can start scanning waveforms to find the signal miscreant and fix the design.

Here again, the use of an emulator is beneficial in that it provides observability of all signals in the design. Which can be a mixed blessing when diving in to a debug session simply because there are so many signals. Taking a hierarchical approach to debugging allows you to radically narrow down not only the scope of clock cycles to study, but also the signals that you should view.

At this point, your standard logic debugging skills take over. There may still be plenty of backbreaking work to do to find where things are going awry; even 10,000 cycles can take some work to sort through. But at least you know that you’re looking in the right region; you’re not wasting your time looking for an anarchist surfer in Fargo.


Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Universal Verification Methodology Coverage for Bluespec RISC-V Cores

Sponsored by Synopsys

This whitepaper explains the basics of UVM functional coverage for RISC-V cores using the Google RISCV-DV open-source project, Synopsys verification solutions, and a RISC-V processor core from Bluespec.

Click to read more

featured chalk talk

Advantech Industrial AI Camera: Small but Mighty
Sponsored by Mouser Electronics and Advantech
Artificial intelligence equipped camera systems can be a great addition to a variety of industrial designs. In this episode of Chalk Talk, Amelia Dalton and Ryan Chan from Advantech explore the components included in an industrial AI camera system, the benefits of Advantech’s AI ICAM-500 Industrial camera series and how you can get started using these solutions in your next industrial design. 
Aug 23, 2023