feature article
Subscribe Now

Breaking the System

Good Pieces Don’t Always Make a Good Whole

It’s seductive logic. If the pieces are good, then the whole, which is but an assemblage of known-good pieces, must be good.

I used that same logic as a kid. Orange juice is good; Cheerios are good. Ergo, using orange juice instead of milk should provide a delicious breakfast.

Wrong. That was a lesson I remember to this day. I could never quite put my finger on exactly why those two things didn’t work together – there was no obvious reason; they just didn’t.

Or here’s another example: take, oh, eight nice, well-behaved dogs. Put them together with no human around (or at least not one that they respect) and let them roam free. Assuming they don’t fight amongst themselves – in fact, especially if they don’t fight, if they all work well together – then, as a pack, they’re likely to create all kinds of havoc that none of them would consider individually. While that has everything to do with dog pack mentality, it reinforces the fact that combinations of good ingredients don’t always result in good outcomes.

I’m sure we’ve all learned that lesson one way or another. And yet it’s easy to fall back on that logic when necessary. Complex SoCs are created from combinations of blocks that are individually tested. Even the interconnect is tested, so, the story should go, putting it all together will work.

Frankly, we may know that this is a bit of a cop out, but, given the complexity of these systems and no obvious way to deal with it, it’s almost like we need this little sleight of hand in order to ship a system without going completely nuts. But the fact remains that the “sum-of-good-things equals good-thing” identity is unproven. In fact, it’s worse than that: it’s all too often proven wrong, as anyone who has received that “we’ve got a problem with the new silicon” phone call can attest.

There’s one primary element that’s new to chip design that contributes a new complexity dimension: software. Software has to be tested before tape-out – and that’s no easy feat, given how many lines of code are required simply to boot Linux. This is where virtual platforms, emulation, and hardware simulation acceleration help – you can run real software to check out how it works in the system.

But, according to Breker’s Adnan Hamid, the typical software that gets tested has some limitations. For the most part, it’s application software – which is good; wringing out the known major use cases is obvious and important. But it tends not to stress the architecture or the hardware. Individual stress tests might challenge the processor or some other localized element, but Breker’s contention is that very few tests work from the chip inputs through the entire system, of which the processor is just a piece, and on to the actual outputs.

This kind of holistic stress testing is what Breker’s TrekSoC tool is about. Given a set of scenarios, the idea is to have the tool automatically create a suite of tests that challenge the corners of operation. The scenarios are where the work is done up front – and this can be done (and is probably best done) long before you approach tape-out. It can be put to use during architectural testing as well, when you’re still working at the transaction level to prove out the system conceptually before implementation starts.

You can think of it as a hierarchy of constrained-random elements. At the bottom are low-level “services” like memory management, interrupts and polling, scheduling, and register access. Randomizable scenarios can be built from these to serve the next level, where drivers will make use of them.

These drivers will also have randomizable scenarios, and they can place constraints on the low-level scenarios so that you don’t end up wasting tests (or getting failing results) on nonsensical situations. The driver scenarios feed application scenarios, and the application scenarios feed performance scenarios.

Given these models, tests are created by positing an outcome (i.e., a set of outputs) resulting from an input and then using a Boolean solver to create tests. The tests will have nothing to do with the actual applications that will be run – they’re simply there to exercise the system as a whole and find any weaknesses.

An example that Mr. Hamid uses is one where a piece of data is written to memory just before an interrupt fires. Because the code causing the data write has been executed prior to the interrupt, it would normally be assumed that the data was, in fact, written and that the interrupt service routine can count on the correct data being in place. But in a system with a network-on-chip interconnect structure, the latency may be such that the data write instruction was requested but did not complete prior to the interrupt. And now you have a data consistency problem.

They also target multicore issues like deadlocks, livelocks, and cache coherency. Such areas are hard to test with application code, which may be very well-behaved. TrekSoC is badly-behaved: it can specifically create tests that will zero in on these kinds of problems to see whether or not the system survives intact.

The test code itself is run on bare metal, and any OS calls are trapped and rerouted, with TrekSoC acting as an “evil” OS in servicing those calls. This keeps the code as close as possible to the subsystems being tested, but it can also save valuable testing time since no OS has to boot.

The form the tests take depends on the target. Early on, when doing architectural evaluations, they can be turned into transactions for IP unit testing. Later in the flow, they generate RTL with offloads and a testbench as well as the code to be run. The offloads, implemented as native code libraries, handle memory observation, “printf” commands, input stimulus, and output checks.

From an abstract testing standpoint, the complete model provides stimulus, results checking, and coverage statistics, making it completely self-contained. The actual flow of the tests – in particular, the randomized nature – is determined at test generation time, not at run time. Any resulting coverage gaps can be addressed by directing the tool to create tests for a specific uncovered node on the flow graph used to depict the scenarios.

If this all works, then it gives you a way of ensuring that you won’t be throwing out breakfast and starting over. Or that the neighbors won’t all be complaining that their trash is turned over and that their cats are all stuck up in trees. And best yet, you won’t be getting that dreaded phone call after first silicon.

 

More info:

Breker

One thought on “Breaking the System”

Leave a Reply

featured blogs
Nov 30, 2023
No one wants to waste unnecessary time in the model creation phase when using a modeling software. Rather than expect users to spend time trawling for published data and tediously model equipment items one by one from scratch, modeling software tends to include pre-configured...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

TDK CLT32 power inductors for ADAS and AD power management

Sponsored by TDK

Review the top 3 FAQs (Frequently Asked Questions) regarding TDK’s CLT32 power inductors. Learn why these tiny power inductors address the most demanding reliability challenges of ADAS and AD power management.

Click here for more information

featured webinar

Rapid Learning: Purpose-Built MCU Software Tools for Data-Driven Embedded IoT Systems

Sponsored by ITTIA

Are you developing an MCU application that captures data of all kinds (metrics, events, logs, traces, etc.)? Are you ready to reduce the difficulties and complications involved in developing an event- and data-centric embedded system? This webinar will quickly introduce you to excellent MCU-specific software options for developing your next-generation data-driven IoT systems. You will also learn how to recognize and overcome data management obstacles. Register today as seats are limited!

Register Now!

featured chalk talk

Switch to Simple with Klippon Relay
In this episode of Chalk Talk, Amelia Dalton and Lars Hohmeier from Weidmüller explore the what, where, and how of Weidmüller's extensive portfolio of Klippon relays. They investigate the pros and cons of mechanical relays, the benefits that the Klippon universal range of relays brings to the table, and how Weidmüller's digital selection guide can help you choose the best relay solution for your next design.
Sep 26, 2023
7,689 views