feature article
Subscribe Now

Breaking the System

Good Pieces Don’t Always Make a Good Whole

It’s seductive logic. If the pieces are good, then the whole, which is but an assemblage of known-good pieces, must be good.

I used that same logic as a kid. Orange juice is good; Cheerios are good. Ergo, using orange juice instead of milk should provide a delicious breakfast.

Wrong. That was a lesson I remember to this day. I could never quite put my finger on exactly why those two things didn’t work together – there was no obvious reason; they just didn’t.

Or here’s another example: take, oh, eight nice, well-behaved dogs. Put them together with no human around (or at least not one that they respect) and let them roam free. Assuming they don’t fight amongst themselves – in fact, especially if they don’t fight, if they all work well together – then, as a pack, they’re likely to create all kinds of havoc that none of them would consider individually. While that has everything to do with dog pack mentality, it reinforces the fact that combinations of good ingredients don’t always result in good outcomes.

I’m sure we’ve all learned that lesson one way or another. And yet it’s easy to fall back on that logic when necessary. Complex SoCs are created from combinations of blocks that are individually tested. Even the interconnect is tested, so, the story should go, putting it all together will work.

Frankly, we may know that this is a bit of a cop out, but, given the complexity of these systems and no obvious way to deal with it, it’s almost like we need this little sleight of hand in order to ship a system without going completely nuts. But the fact remains that the “sum-of-good-things equals good-thing” identity is unproven. In fact, it’s worse than that: it’s all too often proven wrong, as anyone who has received that “we’ve got a problem with the new silicon” phone call can attest.

There’s one primary element that’s new to chip design that contributes a new complexity dimension: software. Software has to be tested before tape-out – and that’s no easy feat, given how many lines of code are required simply to boot Linux. This is where virtual platforms, emulation, and hardware simulation acceleration help – you can run real software to check out how it works in the system.

But, according to Breker’s Adnan Hamid, the typical software that gets tested has some limitations. For the most part, it’s application software – which is good; wringing out the known major use cases is obvious and important. But it tends not to stress the architecture or the hardware. Individual stress tests might challenge the processor or some other localized element, but Breker’s contention is that very few tests work from the chip inputs through the entire system, of which the processor is just a piece, and on to the actual outputs.

This kind of holistic stress testing is what Breker’s TrekSoC tool is about. Given a set of scenarios, the idea is to have the tool automatically create a suite of tests that challenge the corners of operation. The scenarios are where the work is done up front – and this can be done (and is probably best done) long before you approach tape-out. It can be put to use during architectural testing as well, when you’re still working at the transaction level to prove out the system conceptually before implementation starts.

You can think of it as a hierarchy of constrained-random elements. At the bottom are low-level “services” like memory management, interrupts and polling, scheduling, and register access. Randomizable scenarios can be built from these to serve the next level, where drivers will make use of them.

These drivers will also have randomizable scenarios, and they can place constraints on the low-level scenarios so that you don’t end up wasting tests (or getting failing results) on nonsensical situations. The driver scenarios feed application scenarios, and the application scenarios feed performance scenarios.

Given these models, tests are created by positing an outcome (i.e., a set of outputs) resulting from an input and then using a Boolean solver to create tests. The tests will have nothing to do with the actual applications that will be run – they’re simply there to exercise the system as a whole and find any weaknesses.

An example that Mr. Hamid uses is one where a piece of data is written to memory just before an interrupt fires. Because the code causing the data write has been executed prior to the interrupt, it would normally be assumed that the data was, in fact, written and that the interrupt service routine can count on the correct data being in place. But in a system with a network-on-chip interconnect structure, the latency may be such that the data write instruction was requested but did not complete prior to the interrupt. And now you have a data consistency problem.

They also target multicore issues like deadlocks, livelocks, and cache coherency. Such areas are hard to test with application code, which may be very well-behaved. TrekSoC is badly-behaved: it can specifically create tests that will zero in on these kinds of problems to see whether or not the system survives intact.

The test code itself is run on bare metal, and any OS calls are trapped and rerouted, with TrekSoC acting as an “evil” OS in servicing those calls. This keeps the code as close as possible to the subsystems being tested, but it can also save valuable testing time since no OS has to boot.

The form the tests take depends on the target. Early on, when doing architectural evaluations, they can be turned into transactions for IP unit testing. Later in the flow, they generate RTL with offloads and a testbench as well as the code to be run. The offloads, implemented as native code libraries, handle memory observation, “printf” commands, input stimulus, and output checks.

From an abstract testing standpoint, the complete model provides stimulus, results checking, and coverage statistics, making it completely self-contained. The actual flow of the tests – in particular, the randomized nature – is determined at test generation time, not at run time. Any resulting coverage gaps can be addressed by directing the tool to create tests for a specific uncovered node on the flow graph used to depict the scenarios.

If this all works, then it gives you a way of ensuring that you won’t be throwing out breakfast and starting over. Or that the neighbors won’t all be complaining that their trash is turned over and that their cats are all stuck up in trees. And best yet, you won’t be getting that dreaded phone call after first silicon.

 

More info:

Breker

One thought on “Breaking the System”

Leave a Reply

featured blogs
Jun 6, 2023
Learn about our PVT Monitor IP, a key component of our SLM chip monitoring solutions, which successfully taped out on TSMC's N5 and N3E processes. The post Synopsys Tapes Out SLM PVT Monitor IP on TSMC N5 and N3E Processes appeared first on New Horizons for Chip Design....
Jun 6, 2023
At this year's DesignCon, Meta held a session on '˜PowerTree-Based PDN Analysis, Correlation, and Signoff for MR/AR Systems.' Presented by Kundan Chand and Grace Yu from Meta, they talked about power integrity (PI) analysis using Sigrity Aurora and Power Integrity tools such...
Jun 2, 2023
I just heard something that really gave me pause for thought -- the fact that everyone experiences two forms of death (given a choice, I'd rather not experience even one)....

featured video

Synopsys Solution for RTL to Signoff Power Analysis

Sponsored by Synopsys

Synopsys’ industry-leading power analysis solution built on PrimePower technology that enables early RTL exploration, low power implementation and power signoff for design of energy-efficient SoCs.

Learn more about Synopsys’ Energy-Efficient SoCs Solutions

featured paper

EC Solver Tech Brief

Sponsored by Cadence Design Systems

The Cadence® Celsius™ EC Solver supports electronics system designers in managing the most challenging thermal/electronic cooling problems quickly and accurately. By utilizing a powerful computational engine and meshing technology, designers can model and analyze the fluid flow and heat transfer of even the most complex electronic system and ensure the electronic cooling system is reliable.

Click to read more

featured chalk talk

Matter & NXP
Interoperability in our growing Internet of things ecosystem has been a challenge for years. But the new Matter standard is looking to change all of that. It could not only make homes smarter but our design lives easier as well. In this episode of Chalk Talk, Amelia Dalton and Sujata Neidig from NXP examine how Matter will revolutionize IoT by increasing interoperability, simplifying development and providing a comprehensive approach to security and privacy. They also discuss what the roadmap for Matter looks like and how NXP’s Matter reference platforms can help you get started with your next IoT design.
Sep 20, 2022
30,981 views