feature article
Subscribe Now

Moving Back In Time

Or How to Make Your Giant SoC Look Like a 286

It sounds like old times, something that’s been done before.

You take a bit of extra logic, tap into your JTAG infrastructure (pun intended), add some IP, and look into what’s happening with your FPGA. And you’d say, “Oh, that sounds like Altera’s SignalTap or Xilinx’s ChipScope.”

OK, so then say you add some logic to your ASIC, capture and compress a bunch of data, and decompress it on the way out. And you’d say, hey, that sounds sort of like DFT (Design for Test) technology. Sort of. Maybe. (With DFT, usually the stimulus, not the result, comes in compressed and is decompressed on-chip, but it has a similar feel.) Or you say, hey, that sounds like the debug infrastructure that ARM and MIPS provide.

OK, so say you can do both of those things across multiple FPGAs or ASICs. And you do it at the RTL level, pre-synthesis. And, unlike DFT, you can capture not just a single-cycle pass/fail result, but also a history for backtracking how a particular state was reached. And, unlike with the ARM and MIPS debug stuff, you’re debugging not just software, but hardware at any level.

And now it doesn’t quite sound like something that’s been done before.

So let’s start over again.

Let’s assume the test problem has been solved; we’re not going to talk about test or DFT. Debug is another matter. In fact, as I found when looking deeper into SoC debug last year, most tool vendors don’t even think about hardware debug. They’re interested in having you spend big bucks on pre-hardware verification and debug so that you never have to get to the point of doing hardware debug due to a failed chip.

So leave it to Brad Quinton, CTO of Veridae, to crash that party with what he sees as a sobering truth: “Very few chips actually get to first silicon success.” And after all those tools and all that verification time. To quote Malibu’s Most Wanted, “All that buffness for nothin!”*

So here we are, then, with FPGAs or silicon that ain’t what they’re supposed to be. It’s not like debug is completely impossible. As we’ve seen, there are ways of doing an FPGA laparoscopy one-at-a-time, and you can do low-level hardware-assisted software debug. But what about detailed hardware debug on an SoC? Or on the group of multiple FPGAs that are being used to prototype the SoC? According to Veridae (based on ITRS numbers), by 2014, your SoC will have less than 50 pins per million gates. That makes it very difficult to look inside and peer around in any meaningful way if something goes wrong.

This is the problem that they’re trying to solve with their Clarus product. You could sort of say that it’s like a traditional FPGA debug system on steroids. It’s deployed across multiple chips on something of a network, with hardware IP and a set of software tools that manage the data and coordinate it into integrated views into the combination of chips as a whole.

The way this works is by instantiating a number of what they call “capture stations.” A capture station will be assigned to a particular portion of logic – it might be a chip or a clock domain within a chip. And, in fact, when you have multiple clock domains, the typical setup is to have a capture station per domain (although you can combine clock domains within a station using FIFOs).

Signals get to and from the capture stations via a router (often one FPGA in a cluster is given supervisory status); 10 pins are used as dedicated I/O for this. In between the router and the JTAG TAP controller is an access control block, which not only provides optional security, but also allows a host CPU to access the setup.

The latter actually means that you can memory-map the debug circuit to the processor, have the debugger alert the CPU if something is amiss, and then the CPU can potentially work around the problem.

Of course, the other issue with debuggers is figuring out how to get access to thousands of signals through a few pins. No, wait, that sounds dated. How to get access to tens or hundreds of thousands of signals through a few pins. You only get so many chances to grab signals – and if you don’t know in advance what bug you’ll be looking for, how do you know how to instrument the system for viewing what’s going on?

You handle this through three choices. First, the Implementor tool (more on it later) ranks signals for you to help you decide where to put access. Next you select how many signals you want to be able to see, period. Then you select how many you can see at the same time. The latter can be any combination of the former. That means that, if you have 1000 signals visible, 30 at a time, then in a single capture you can see any combination whatsoever of 30 signals out of 1000. Don’t make me do the math on how many combinations that is. Please. I can’t fit that big a Pascal’s triangle on one piece of paper. It’s a lot.

Now if you actually want to see more than 30 signals (in this example), you can actually do more than one capture, each one having a different combination of signals visible. Under the assumption that you’re exactly re-creating the test case for each capture, you can then combine the captures into a single view.

The tools overlaying this also appear to be aimed at reducing some of the error-prone drudgery of dealing with debug. The Implementor tool helps with setup, ranking nodes, automatically discovering clocks, and generating test benches and formal scripts.

When it’s all in place and you’re in the act of debugging, the Analyzer tool lets you set up the captures, and then the Investigator tool takes the captured data and tries to do more intelligent stuff with it – mapping it to RTL, using known conditions like register settings, extrapolating indirect data, and tracing data backwards and forwards through the RTL.

So what does all of this get you?

Their claim through all of this is that, on a modern chip, by adding 1% more area to support the debug, the observability of the resulting chip (expressed as observation points per million gates) will be better than that of an old Intel 486. By adding 3%, it will be better than a 286.

Just like old times.

 

*Oh come on, admit it, you know you laughed during that movie. Even if only once.

 

More info:  Veridae Clarus

Leave a Reply

featured blogs
Oct 20, 2020
In 2020, mobile traffic has skyrocketed everywhere as our planet battles a pandemic. Samtec.com saw nearly double the mobile traffic in the first two quarters than it normally sees. While these levels have dropped off from their peaks in the spring, they have not returned to ...
Oct 20, 2020
Voltus TM IC Power Integrity Solution is a power integrity and analysis signoff solution that is integrated with the full suite of design implementation and signoff tools of Cadence to deliver the... [[ Click on the title to access the full blog on the Cadence Community site...
Oct 19, 2020
Have you ever wondered if there may another world hidden behind the facade of the one we know and love? If so, would you like to go there for a visit?...
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

See Inuitive’s NU4000 3D imaging and vision processor in action. The SoC supports high-quality 3D depth processor engine, SLAM accelerators, computer vision, and deep learning by integrating Synopsys ARC EV processor. In this demo, the NU4000 demonstrates simultaneous 3D sensing, SLAM and CNN functionality by mapping out its environment and localizing the sensor while identifying the objects within it. For more information, visit inuitive-tech.com.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

Featured Paper

The Cryptography Handbook

Sponsored by Maxim Integrated

The Cryptography Handbook is designed to be a quick study guide for a product development engineer, taking an engineering rather than theoretical approach. In this series, we start with a general overview and then define the characteristics of a secure cryptographic system. We then describe various cryptographic concepts and provide an implementation-centric explanation of physically unclonable function (PUF) technology. We hope that this approach will give the busy engineer a quick understanding of the basic concepts of cryptography and provide a relatively fast way to integrate security in his/her design.

Click here to download the whitepaper

Featured Chalk Talk

Hello FPGA

Sponsored by Mouser Electronics and Microchip

Getting started on an FPGA-based embedded vision project can be tricky. Locating all the components you need, getting them to talk to each other, and just getting your system to the video equivalent of “Hello World” is a pretty daunting task. In this episode of Chalk Talk, Amelia Dalton chats with Avery Williams of Microchip Technology about the Hello FPGA kit - a low-cost, low-touch embedded vision kit for engineers new to FPGAs.

More information about Microchip Technology Hello FPGA Kit