Emulate This!

When each chip you design is going to cost you millions in mask charges and other associated fees, and when any mistake in such a chip can cost you millions more, it makes sense that you’re willing to fork out some cash to help reduce the chances of a flub. And when getting to market sooner means dollars in your pocket, it’s likely that getting a chance to test your software earlier will also be worth some coin.

Of course, this is the whole reason anyone pays for good chip design tools (as opposed to simple software design, where a mistake – in theory – costs nothing but a follow-up patch). And it’s why a ton of that payment is for verification. And a non-trivial part of such verification can be allocated to hardware acceleration. Such acceleration not only gets you through more testing more quickly, but it also lets you emulate the system in which software will run quickly enough to where you can actually test out some of your software in advance of the hardware being available.

You wouldn’t necessarily think that the name for such an accelerator would be controversial, but, in a world where positioning is key, names are very important. We’re going to be stubborn here and call such accelerators emulators, even though that might cause some unhappiness in some quarters. It’s nothing personal or malicious, to be sure. But I’m getting ahead of myself.

There have been a couple of announcements in the somewhat staid-seeming emulation world lately. So it’s a good time to take a look around and put some context behind them. Frankly, there don’t seem to be that many emulator guys anymore. Some got bought, some disappeared. With what remains, there appear to be two broad classes of box: FPGA-based and custom-chip-based.

Programmed to receive

The reconfigurability of FPGAs provides an obvious way to prototype a chip before ordering out for actual silicon. And lots of chip designers do ad hoc prototyping with FPGAs. But there are a couple systems out there that are designed specifically to provide broad prototyping and accelerated verification capabilities across a wide range of designs. Neither of the ones making news recently is a new system, but each has its reasons for making some noise.

We talked about Eve some time ago with respect to emulation and transaction-level verification. They’ve recently announced that they’ve sped up their compilation from SystemVerilog to netlist by as much as 10x (or even, on the outside, 20x). An obvious question then arises, if they can speed up their compiles for Xilinx FPGAs so much, then why can’t everyone else?

And the answer is that, for everyone else, compilation speed doesn’t reflect the bottom line: density and design performance do. During verification, you’re presumably doing multiple runs, making changes, and running it again, The faster you can make each turn, the faster you’ll get things done. As Eve’s Ian Nixon says, “If the end goal is faster compile times, then you approach the problem differently.” There’s no specific technical trick that they used; they just went back to the beginning and reprioritized to reduce compile time.

By replacing standard third-party synthesis tools with their new zFast synthesis tool, Eve cut the compile time of a 20-million gate design from about five hours down to about 30 minutes (with both runs being done on a five-computer farm). That’s about a 10x improvement. Bear in mind that this is then followed by emulator compilation and FPGA place-and-route, so the whole process takes a couple more hours beyond the synthesis time.

Oh, and since we’re talking FPGA-based systems here, the design size becomes immediately suspect. So, just to be clear, they frame design sizes in terms of ASIC gates – about a 10x reduction over what might be claimed by the incumbent FPGA vendor. Using the ASIC-gate metric, their largest box can handle 100-million gates – and eight boxes can be ganged together.

Prioritizing compile time implies that something else gets second billing. So, for example, the design size may be up to 20% larger than a standard synthesis tool would provide. That might be completely unacceptable for an FPGA design destined for production, but if it’s just going into an emulator, assuming you’re not right up against the edge in capacity, then who cares?

The other announcement in this sphere is Synopsys’ acquisition of the CHIPit platform. And here things get a tad confusing, since Synopsys now has two of what they refer to as “rapid prototyping” systems. They have the HAPS platform, which they acquired through Synplicity, and now CHIPit, which they got from ProDesign more recently. Both are marketed under the Confirma brand.

And this is where we start to work the semantics a bit. HAPS is a pretty traditional rapid prototyping system, where you can implement your RTL ahead of silicon, but at a point when it’s relatively stable. You can achieve high performance and start working seriously on the software that will go into the system. In contrast, CHIPit, while still called a rapid prototyping system by Synopsys, is really more of an emulator – it allows an emulator use model, including use of the SCE-MI interface for transaction-based verification. If it walks like an emulator and quacks like an emulator… It would generally be used earlier in the design cycle, and can help more in the architectural planning and early software tryout phase. Whereas HAPS can be reconfigured by using hardware and cables, CHIPit allows reconfiguration programmably, making it much easier to make substantive high-level changes quickly.

Of course, it’s clear what they’re trying to do with this subtle positioning as they insist CHIPit is a rapid prototyping system and not an emulator, and it’s no surprise. They want to be viewed apart from the emulation behemoths; their capacity is in the 30-million-gate range (with compile times in the 2- to 8-hour range). There are obvious pricing reasons for wanting to create that distinction. But just as one might have an “aha” moment when learning that a cygnet is simply a baby swan, so one might also nod in recognition to find that, for practical purposes, CHIPit is simply a smaller emulator. It lets you mentally click all the emulator boxes – except perhaps the seven-digit (or close thereto) price tag box. And it makes it easier to understand the difference between it and HAPS.

Please pass the chips

Using FPGAs for prototyping has benefits, but it has challenges as well. When you’re doing verification, visibility of the internals of a design is critical if you want to see what’s going on when something goes awry. Such visibility can be had in FPGAs, but you typically have to compile it into the design. Both Mentor and Cadence have gotten around this problem by designing their own custom chips for their emulation systems. Yes, you read that right: they have custom silicon. It’s not 45-nm silicon, so it’s not as expensive as the chips they’re emulating, but, even at 130 or 90 nm, there’s a cost.

So what do you get for that cost? Well, let’s start with Mentor’s Veloce systems. Their approach also has some acquisitional (is that a word?) provenance via Meta Systems and IKOS. Having passed through the FPGA approach, they eventually designed what is effectively their own 4-LUT-based FPGA. The difference is a controllability/observability infrastructure that allows all logic elements and signals to be tapped and sent to memory. This implies the need for more memory in addition to the infrastructure. Such an FPGA would have little appeal outside this environment since the infrastructure and extra memory take space that would not be valued in a standard commercial FPGA design.

These chips are assembled on boards that are assembled into systems that can emulate up to 500-million-gate designs. Compile times are on the order 20-million gates per hour.

When any of these guys positions, the one consistent vendor they position against is Cadence’s Palladium system, which appears to be the fifteen-ton gorilla in this space. Cadence touts their software and the completeness of their solution as part of their strength. Their boxes can handle up to 256-million ASIC gates, but others are quick to point out that they use a processor-based architecture rather than an FPGA-based one (and I’m counting Mentor’s solution as a custom FPGA). So exactly what is this processor? Well, more specifically, it’s a Boolean processor. Dig a bit deeper, and it’s : an array of 4-LUTs. OK… some architectural subtleties aside, this sounds pretty darn similar to an FPGA.

Here’s where the real difference seems to be: Cadence’s system is fully synchronous. This means that all clocks in the emulated system must be a multiple of an internal fast-clock (192 MHz). The other systems allow mutually asynchronous clock domains in the same manner that any FPGA allows; if you want two domains, one at 12 MHz, one at 13 MHz, you create two domains at those frequencies (or ratioed down if limited by the highest desired frequency). With Palladium, the fast-clock becomes the basic quantum of clocking. So one domain would run with one cycle being 12 fast-clock ticks; the other would run with once cycle being 13 fast-clock ticks. Yes, I know, 12 and 13 MHz aren’t particularly challenging speeds, and the emulator might even run that at system rate. It’s just a simple example; work with me, folks.

The practical implications of this are that, if you have wildly unrelated clock domains, your actual execution speed will go down according to the clock ratios. 2:1 is easy; one tick to two. 12:13 is slower: 12 ticks to 13. Multiple clocks with odd ratios exacerbate this. Is this a serious problem? Not clear. They seem to be selling systems, so it’s certainly not a total deal-breaker. The bottom line is, if you can verify a system at a satisfactory rate, and if the clocking details are all buried and handled by the compilation software (which they are), then it might be a don’t-care.

All in all, if you’re out shopping for an emulator, you’ve got a range of price points and a range of capacities and features. Realistically, once your price and capacity needs are met, ease of use is going to rule the day. At least on your first purchase. After building infrastructure and buying add-ons for I/Os and other bits and pieces to assemble a real system-look-alike, you tend to stick with the platform you started with. Incumbency has definite advantages. Which is why much of the sale is focused on the advantages of emulation – that is, bringing new users into the fold and hopefully capturing them for the long term.