feature article
Subscribe Now

Breaking the System

Good Pieces Don’t Always Make a Good Whole

It’s seductive logic. If the pieces are good, then the whole, which is but an assemblage of known-good pieces, must be good.

I used that same logic as a kid. Orange juice is good; Cheerios are good. Ergo, using orange juice instead of milk should provide a delicious breakfast.

Wrong. That was a lesson I remember to this day. I could never quite put my finger on exactly why those two things didn’t work together – there was no obvious reason; they just didn’t.

Or here’s another example: take, oh, eight nice, well-behaved dogs. Put them together with no human around (or at least not one that they respect) and let them roam free. Assuming they don’t fight amongst themselves – in fact, especially if they don’t fight, if they all work well together – then, as a pack, they’re likely to create all kinds of havoc that none of them would consider individually. While that has everything to do with dog pack mentality, it reinforces the fact that combinations of good ingredients don’t always result in good outcomes.

I’m sure we’ve all learned that lesson one way or another. And yet it’s easy to fall back on that logic when necessary. Complex SoCs are created from combinations of blocks that are individually tested. Even the interconnect is tested, so, the story should go, putting it all together will work.

Frankly, we may know that this is a bit of a cop out, but, given the complexity of these systems and no obvious way to deal with it, it’s almost like we need this little sleight of hand in order to ship a system without going completely nuts. But the fact remains that the “sum-of-good-things equals good-thing” identity is unproven. In fact, it’s worse than that: it’s all too often proven wrong, as anyone who has received that “we’ve got a problem with the new silicon” phone call can attest.

There’s one primary element that’s new to chip design that contributes a new complexity dimension: software. Software has to be tested before tape-out – and that’s no easy feat, given how many lines of code are required simply to boot Linux. This is where virtual platforms, emulation, and hardware simulation acceleration help – you can run real software to check out how it works in the system.

But, according to Breker’s Adnan Hamid, the typical software that gets tested has some limitations. For the most part, it’s application software – which is good; wringing out the known major use cases is obvious and important. But it tends not to stress the architecture or the hardware. Individual stress tests might challenge the processor or some other localized element, but Breker’s contention is that very few tests work from the chip inputs through the entire system, of which the processor is just a piece, and on to the actual outputs.

This kind of holistic stress testing is what Breker’s TrekSoC tool is about. Given a set of scenarios, the idea is to have the tool automatically create a suite of tests that challenge the corners of operation. The scenarios are where the work is done up front – and this can be done (and is probably best done) long before you approach tape-out. It can be put to use during architectural testing as well, when you’re still working at the transaction level to prove out the system conceptually before implementation starts.

You can think of it as a hierarchy of constrained-random elements. At the bottom are low-level “services” like memory management, interrupts and polling, scheduling, and register access. Randomizable scenarios can be built from these to serve the next level, where drivers will make use of them.

These drivers will also have randomizable scenarios, and they can place constraints on the low-level scenarios so that you don’t end up wasting tests (or getting failing results) on nonsensical situations. The driver scenarios feed application scenarios, and the application scenarios feed performance scenarios.

Given these models, tests are created by positing an outcome (i.e., a set of outputs) resulting from an input and then using a Boolean solver to create tests. The tests will have nothing to do with the actual applications that will be run – they’re simply there to exercise the system as a whole and find any weaknesses.

An example that Mr. Hamid uses is one where a piece of data is written to memory just before an interrupt fires. Because the code causing the data write has been executed prior to the interrupt, it would normally be assumed that the data was, in fact, written and that the interrupt service routine can count on the correct data being in place. But in a system with a network-on-chip interconnect structure, the latency may be such that the data write instruction was requested but did not complete prior to the interrupt. And now you have a data consistency problem.

They also target multicore issues like deadlocks, livelocks, and cache coherency. Such areas are hard to test with application code, which may be very well-behaved. TrekSoC is badly-behaved: it can specifically create tests that will zero in on these kinds of problems to see whether or not the system survives intact.

The test code itself is run on bare metal, and any OS calls are trapped and rerouted, with TrekSoC acting as an “evil” OS in servicing those calls. This keeps the code as close as possible to the subsystems being tested, but it can also save valuable testing time since no OS has to boot.

The form the tests take depends on the target. Early on, when doing architectural evaluations, they can be turned into transactions for IP unit testing. Later in the flow, they generate RTL with offloads and a testbench as well as the code to be run. The offloads, implemented as native code libraries, handle memory observation, “printf” commands, input stimulus, and output checks.

From an abstract testing standpoint, the complete model provides stimulus, results checking, and coverage statistics, making it completely self-contained. The actual flow of the tests – in particular, the randomized nature – is determined at test generation time, not at run time. Any resulting coverage gaps can be addressed by directing the tool to create tests for a specific uncovered node on the flow graph used to depict the scenarios.

If this all works, then it gives you a way of ensuring that you won’t be throwing out breakfast and starting over. Or that the neighbors won’t all be complaining that their trash is turned over and that their cats are all stuck up in trees. And best yet, you won’t be getting that dreaded phone call after first silicon.

 

More info:

Breker

One thought on “Breaking the System”

Leave a Reply

featured blogs
Oct 29, 2020
'€˜Conserve Power' is a series of blogs that gives a sneak peek into the world of low power verification. It uncovers the functionality and potential of Virtuoso Power Manager, which lets you... [[ Click on the title to access the full blog on the Cadence Community si...
Oct 28, 2020
You rarely get to hear people of this caliber talk in this '€œfireside chat'€ manner, so I would advise younger engineers to take the time to listen to these industry luminaries....
Oct 27, 2020
Back in January 2020, we rolled out a new experience for component data for our discrete wire products. This update has been very well received. In that blog post, we promised some version 2 updates that would better organize the new data. With this post, we’re happy to...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...

Featured video

Synopsys and Intel Full System PCIe 5.0 Interoperability Success

Sponsored by Synopsys

This video demonstrates industry's first successful system-level PCI Express (PCIe) 5.0 interoperability between the Synopsys DesignWare Controller and PHY IP for PCIe 5.0 and Intel Xeon Scalable processor (codename Sapphire Rapids). The ecosystem can use the companies' proven solutions to accelerate development of their PCIe 5.0-based products in high-performance computing and AI applications.

More information about DesignWare IP Solutions for PCI Express

featured paper

Designing highly efficient, powerful and fast EV charging stations

Sponsored by Texas Instruments

Scaling the necessary power for fast EV charging stations can be challenging. One solution is to use modular power converters stacked in parallel. Learn more in our technical article.

Click here to download the technical article

Featured Chalk Talk

Bulk Acoustic Wave (BAW) Technology

Sponsored by Mouser Electronics and Texas Instruments

In industrial applications, crystals are not ideal for generating clock signal timing. They take up valuable PCB real-estate, and aren’t stable in harsh thermal and vibration environments. In this episode of Chalk Talk, Amelia Dalton chats with Nick Smith from Texas Instruments about bulk acoustic wave (BAW) technology that offers an attractive alternative to crystals.

More information about Texas Instruments Bulk Acoustic Wave (BAW) Technology