feature article
Subscribe Now

Breaking the System

Good Pieces Don’t Always Make a Good Whole

It’s seductive logic. If the pieces are good, then the whole, which is but an assemblage of known-good pieces, must be good.

I used that same logic as a kid. Orange juice is good; Cheerios are good. Ergo, using orange juice instead of milk should provide a delicious breakfast.

Wrong. That was a lesson I remember to this day. I could never quite put my finger on exactly why those two things didn’t work together – there was no obvious reason; they just didn’t.

Or here’s another example: take, oh, eight nice, well-behaved dogs. Put them together with no human around (or at least not one that they respect) and let them roam free. Assuming they don’t fight amongst themselves – in fact, especially if they don’t fight, if they all work well together – then, as a pack, they’re likely to create all kinds of havoc that none of them would consider individually. While that has everything to do with dog pack mentality, it reinforces the fact that combinations of good ingredients don’t always result in good outcomes.

I’m sure we’ve all learned that lesson one way or another. And yet it’s easy to fall back on that logic when necessary. Complex SoCs are created from combinations of blocks that are individually tested. Even the interconnect is tested, so, the story should go, putting it all together will work.

Frankly, we may know that this is a bit of a cop out, but, given the complexity of these systems and no obvious way to deal with it, it’s almost like we need this little sleight of hand in order to ship a system without going completely nuts. But the fact remains that the “sum-of-good-things equals good-thing” identity is unproven. In fact, it’s worse than that: it’s all too often proven wrong, as anyone who has received that “we’ve got a problem with the new silicon” phone call can attest.

There’s one primary element that’s new to chip design that contributes a new complexity dimension: software. Software has to be tested before tape-out – and that’s no easy feat, given how many lines of code are required simply to boot Linux. This is where virtual platforms, emulation, and hardware simulation acceleration help – you can run real software to check out how it works in the system.

But, according to Breker’s Adnan Hamid, the typical software that gets tested has some limitations. For the most part, it’s application software – which is good; wringing out the known major use cases is obvious and important. But it tends not to stress the architecture or the hardware. Individual stress tests might challenge the processor or some other localized element, but Breker’s contention is that very few tests work from the chip inputs through the entire system, of which the processor is just a piece, and on to the actual outputs.

This kind of holistic stress testing is what Breker’s TrekSoC tool is about. Given a set of scenarios, the idea is to have the tool automatically create a suite of tests that challenge the corners of operation. The scenarios are where the work is done up front – and this can be done (and is probably best done) long before you approach tape-out. It can be put to use during architectural testing as well, when you’re still working at the transaction level to prove out the system conceptually before implementation starts.

You can think of it as a hierarchy of constrained-random elements. At the bottom are low-level “services” like memory management, interrupts and polling, scheduling, and register access. Randomizable scenarios can be built from these to serve the next level, where drivers will make use of them.

These drivers will also have randomizable scenarios, and they can place constraints on the low-level scenarios so that you don’t end up wasting tests (or getting failing results) on nonsensical situations. The driver scenarios feed application scenarios, and the application scenarios feed performance scenarios.

Given these models, tests are created by positing an outcome (i.e., a set of outputs) resulting from an input and then using a Boolean solver to create tests. The tests will have nothing to do with the actual applications that will be run – they’re simply there to exercise the system as a whole and find any weaknesses.

An example that Mr. Hamid uses is one where a piece of data is written to memory just before an interrupt fires. Because the code causing the data write has been executed prior to the interrupt, it would normally be assumed that the data was, in fact, written and that the interrupt service routine can count on the correct data being in place. But in a system with a network-on-chip interconnect structure, the latency may be such that the data write instruction was requested but did not complete prior to the interrupt. And now you have a data consistency problem.

They also target multicore issues like deadlocks, livelocks, and cache coherency. Such areas are hard to test with application code, which may be very well-behaved. TrekSoC is badly-behaved: it can specifically create tests that will zero in on these kinds of problems to see whether or not the system survives intact.

The test code itself is run on bare metal, and any OS calls are trapped and rerouted, with TrekSoC acting as an “evil” OS in servicing those calls. This keeps the code as close as possible to the subsystems being tested, but it can also save valuable testing time since no OS has to boot.

The form the tests take depends on the target. Early on, when doing architectural evaluations, they can be turned into transactions for IP unit testing. Later in the flow, they generate RTL with offloads and a testbench as well as the code to be run. The offloads, implemented as native code libraries, handle memory observation, “printf” commands, input stimulus, and output checks.

From an abstract testing standpoint, the complete model provides stimulus, results checking, and coverage statistics, making it completely self-contained. The actual flow of the tests – in particular, the randomized nature – is determined at test generation time, not at run time. Any resulting coverage gaps can be addressed by directing the tool to create tests for a specific uncovered node on the flow graph used to depict the scenarios.

If this all works, then it gives you a way of ensuring that you won’t be throwing out breakfast and starting over. Or that the neighbors won’t all be complaining that their trash is turned over and that their cats are all stuck up in trees. And best yet, you won’t be getting that dreaded phone call after first silicon.

 

More info:

Breker

One thought on “Breaking the System”

Leave a Reply

featured blogs
Oct 15, 2021
We will not let today's gray and wet weather in Fort Worth (home of Cadence's Pointwise team) put a damper on the week's CFD news which contains something from the highbrow to the... [[ Click on the title to access the full blog on the Cadence Community site. ...
Oct 13, 2021
How many times do you search the internet each day to track down for a nugget of knowhow or tidbit of trivia? Can you imagine a future without access to knowledge?...
Oct 13, 2021
High-Bandwidth Memory (HBM) interfaces prevent bottlenecks in online games, AI applications, and more; we explore design challenges and IP solutions for HBM3. The post HBM3 Will Feed the Growing Need for Speed appeared first on From Silicon To Software....
Oct 4, 2021
The latest version of Intel® Quartus® Prime software version 21.3 has been released. It introduces many new intuitive features and improvements that make it easier to design with Intel® FPGAs, including the new Intel® Agilex'„¢ FPGAs. These new features and improvements...

featured video

Simplify building automation designs with MSP430

Sponsored by Texas Instruments

Smart building automation requires simple, flexible designs. With integrated, high-performance signal chain, MSP430 MCUs can enable high-accuracy motion detection, sensing and motor control to take performance and efficiency to the next level.

Click here for more information

featured paper

System-Level Benefits of the Versal Platform

Sponsored by Xilinx

This white paper provides both a qualitative and quantitative analysis of Versal ACAP system-level capabilities for a host of markets ranging from cloud to wired networking and 5G wireless infrastructure. Learn how the Versal architecture delivers best-in-class performance/watt leadership over competing 10nm FPGA architectures in end-applications such as AI compute accelerator, 5G Massive MIMO, network accelerator, smart SSDs, and multi-terabit SmartPHY—supported with data that can be validated with public tools.

Click to read more

featured chalk talk

AC Protection & Motor Control in HVAC Systems

Sponsored by Mouser Electronics and Littelfuse

The design of HVAC systems poses unique challenges for things like motor control and circuit protection. System performance and reliability are critical, and those come in part from choosing the right components for the job. In this episode of Chalk Talk, Amelia Dalton chats with Ryan Sheahen of Littelfuse about choosing the right components for your next HVAC design.

Click here for more information about Littelfuse AC Protection & Motor Control in HVAC Solutions