posted by Bryon Moyer
Cadence is proposing a new way to approach debug. It’s almost an obvious way, except that this isn’t how most debug has traditionally been done. The real reason this hasn’t been done before is simple: data. We’ll come back to that in a sec.
Their point is that, for most debug today, you have to anticipate where problems are likely to crop up and then manually instrument your code with “printf” statements (or the equivalent) so that you get some visibility into what’s going on with your program.
That works OK for your first simulation run – up to the point when something goes wrong without an accompanying printf to provide clues. So you go back and add more printfs and – and this is the key – you resimulate.
By Cadence’s estimation, 50% of verification effort is debugging, and 25% is running tests. Together, they’re ¾ of the pie. Each resimulation is more test time, and because the debug effort resembles successive approximation as you try to zero in on the cause, it’s less efficient. Their big idea is to make debug more directed and – this is the big part – make it 100% doable after only one verification run.
The result is Indago (no, it doesn’t sound like “indigo”; it’s “in-DAH-go,” apparently Latin for hunting or tracking). There are a few key pieces to this approach.
The main one is the fact that all artifacts – data, logs, code execution, etc. – are captured. In other words, instead of having to decide ahead which data to expose via printf, you simply get everything. That means that debug efforts have all the data they need – no subsequent runs to capture new data are needed.
From there, they have what they call “root cause analysis” that helps point you in the direction of a bug. When a signal is identified by the testbench as being incorrect, the tool can identify a short list of possible causes, and you can drill in from there (even crossing into third-party IP as long as it’s not encrypted).
Finally, they have three apps that they layer above this fundamental technology. One is their Debug Analyzer, which allows multi-language (SystemVerilog, e, and SystemC) code debug. The second is Embedded Software Debug, which helps debug co-verified software and hardware (and optimized for their Palladium emulator and Incisive simulator). Finally, Protocol Debug provides abstraction when debugging protocols so that you can observe what’s happening at a higher level.
These three apps can be run together at the same time. To some extent, they provide alternative views of the same information, and they stay synchronized. You can move back and forth between them, say, highlighting something in one and then viewing in another.
Indago isn’t tied to Cadence’s verification tools; it can also be used with other engines mixed and matched from different EDA providers.
Finally, a quick word on a buzzphrase that featured prominently in the announcement: Big Data. When you hear that, you might think Hadoop or Lambda Architecture or datamarts or NoSQL searches or any number of mysterious acronyms and algorithms and incantations. Anything up to the point of Deep Learning, which is yet another buzzphrase.
I tried to drill in to see what “Big Data” meant in this context. And, in fact, it’s mostly none of that prior stuff. It’s “big data” in the most general sense, the highest-level big-data concept. And that is, “Grab everything you can, up to and including your mother-in-law, and stash it away cuz you might need it someday.” Indago embraces that aspect – it’s key to eliminating subsequent verification iterations while debugging.
To my earlier point, it’s only in modern times that memory is cheap and big enough (and we can dump data to it fast enough) to where we can afford to be this “wasteful” – after all, an enormous percentage of that stored data will never, ever be used. Unlike in the past, that’s no longer an unacceptable cost. Accelerating debug is worth more than the extra storage.
posted by Bryon Moyer
In a sleepy little town of 4 or 5 houses, you can be pretty informal about how mail arrives at its destinations. People can come pick it up at the post office, or the postmaster can drop it off on the way home, or whatever works. But once you get too many houses, you have to get organized: create routes and schedules and hire delivery folks to handle deliveries in a more structured manner.
That’s what’s happened with SoCs: the ad-hoc interconnect schemes of yore are giving way to networks-on-chip (NoCs) so that the complex communication interplay between blocks can be carefully designed, managed, and tuned.
Which is good, except that a NoC is a complex animal, and, traditionally, it goes into the chip layout mix as part of the whole – it’s just another (complex) bit of IP. Layout affects performance, so tuning and closing the timing of a NoC in the middle of the rest of the layout would presumably be a difficult proposition. It also adds a significant burden to the EDA tools trying to manage the whole thing.
So Arteris has a proposal: segregate the NoC that from the rest of the circuit and optimize it independently. This relies on a layout that provides channels between IP instances where the NoC lines and circuits will be placed.
They describe a three-step process starting after initial layout. First, the NoC IP is isolated so that timing and routing can be optimized. In the second part, pipeline stages are automatically added (as they point out, you’ll never get from point A to point B across a 28-nm chip in one clock cycle). Finally, timing is closed using physical synthesis – which they claim can provide single-pass success.
This lets you optimize the NoC unburdened by the rest of the SoC, and it lets the EDA tools handle the rest of the SoC unburdened by the NoC. Arteris says that this divide-and-conquer approach gets you to tape-out faster than trying to do the whole thing at once.
You can read more in their announcement.
posted by Bryon Moyer
QuickLogic is back, pushing power numbers down again. They’re now touting what they say is the lowest-power sensor hub, at 75 µW, with their ArcticLink 3 S2 LP.
You may recall that QuickLogic’s ArcticLink 3 is a “custom PLD,” if you like. It’s got an internal programmable fabric, plus hardened logic and a couple of processors. The solution, much of which comes pre-canned, is a combination of logic and state machine and multipliers and microcode, with a modicum of programmability. It’s a carefully crafted approach, as we discussed a while back.
QuickLogic has come back a couple of times with power reductions on their original device. I asked what changed in the S2 LP vs. the prior S2: process and design tweaks. There’s no functional difference. I asked if there was ever a reason to use the S2 instead of the S2 LP; their answer was, “Not really.” So it seems to be a story of “lower power for free.” How often do you get that?
Competitors will question how much processing this device will allow – it’s certain that there are other solutions – likely microcontroller-based – that could, with larger memories, handle more sophisticated algorithms – at the cost of higher power. PNI can probably squeeze more algorithm-per-microwatt than a generic microcontroller since their solutions are largely fixed. (Programmability costs…) But they’re still higher than 75 µW.
But much of that is conjecture and gut-feel on my part. Where the breaking point is for each of these architectures… well, I don’t know if anyone has a real answer to that. Almost makes you wish for some way of figuring out what can go into which device for how much power.
You can read more in QuickLogic’s announcement.
(Image courtesy QuickLogic)