Late last year, Cadence released their new emulator edition, the Palladium Z1. Seems like that makes it time to take a look at the emulation environment to see where the different providers lie. We recently talked about Mentor’s application approach, but that was a higher-level discussion; we haven’t looked at the actual boxes for a long time.
Looking at the big picture, it would appear that all of the traditional major players are equipped to handle very large SoC designs, with each system having relative strengths and weaknesses. That said, this is a very tight, super-competitive space, with itchy fingers on triggers. So I’m hoping I don’t get anything factually wrong; I’ll be jumped all over. I may, however, still earn wrath since I’m not going to dub any system as the best at everything, which may run afoul of some marketing messaging.
My process here was to try to talk to all the main players. I had conversations with Cadence and Mentor, but Synopsys declined to participate. While it’s not the focus of this article, Synopsys and Mentor have been locked in a legal battle, so it’s likely that Synopsys didn’t see any value in joining the conversation as compared with possible harm from what they might (even inadvertently) say. That’s my conjecture, to be clear. Bottom line, any Synopsys info here won’t have come directly from Synopsys.
So whom would it come from, then? Well, competitors do talk about each other, so they may make claims. But I also had one more source who, at least in principle, is not supposed to be tied to any vendor – his role these days is as a consultant, and some would consider him the most knowledgeable one in this space. This is Lauro Rizzatti.
Lauro used to be VP of marketing for EVE/Synopsys, so he has history there, and, amongst other things, he currently consults for Mentor. This sometimes creates cries of bias, although it doesn’t necessarily seem reasonable that a) an expert consultant in the space should never have worked in the space before, and b) an expert consultant should not contract with anyone in the space. That’s not exactly how consulting works.
Lauro and Frank Schirrmeister, who speaks for the Cadence emulation effort, have had something of an online argument (which we’re not going to get into at all, although some of the points will feature in that argument). I’m not going to insert myself between any of these guys; I point this out simply for full disclosure and to provide reassurance that I’m doing what I can to present all sides without picking the winner. That’s your job.
While we focus mostly on the three main big guys, there’s also a new player that we’ll discuss at the end. You probably weren’t expecting that. I know – I’m a tease.
So for the next bit, we’re going to look not at a blow-by-blow spec comparison (there are data sheets for that, and some specs aren’t publicly disclosed), but rather on a qualitative discussion of the various areas where these machines demonstrate interesting contrasts or issues.
Welcome to the Data Center
One of the big moves in emulation was to get away from individual standalone boxes and move towards a rack-based data-center form factor. With the new Palladium edition, all of the providers are now there (although I’m less sure with Synopsys). As before, multiple machines can be ganged together and used to emulate very large designs by partitioning the design.
In order for the boxes to talk to each other, they have to plug into a backplane, and there are a couple of approaches here. Synopsys uses a passive backplane, which provides a less expensive option. Mentor, on the other hand, uses an active backplane. This improves routability thanks to the chips that perform switching, but it also raises the price of the backplane significantly (and presumably pulls more power).
Cadence, meanwhile, uses optical connections. And they claim the unique capability (based on their processor type – more on that below) of trading processing speed for bandwidth. In other words, if a particular calculation requires communication across the backplane, extra clock cycles can be inserted into the design at compile time to accommodate that communication latency.
There is a potential gotcha when moving to the data center. When everything was in a local lab, you could go in there and connect any peripherals and speed bridges (which adapt the high speeds of real-world traffic to the slower speeds of the emulator) necessary when performing in-circuit emulation (ICE), which focuses on delivering realistic traffic from real-life traffic sources like an Ethernet cable or a USB device. This is an area that Cadence claims to be particularly strong in.
The thing is, however, that once everything moves into some data center half-way round the world, you can no longer just waltz in and connect up your peripherals and speed bridges. While it is still possible for someone local to manually connect peripherals, there’s also a level of virtualization available. Mentor has networked VirtuaLAB boxes dedicated to certain peripheral functions, and they can be connected remotely. Not all peripherals have a VirtuaLAB box, however.
Cadence claims that their speed bridges can all be connected virtually, and that they can reside as far as 30 meters from the emulator to which they’re connected. (Here again, I’m not sure where Synopsys sits on this one.)
One of the topics we covered in the earlier Mentor article was about what they call Deterministic ICE, which provides a replay feature for debugging live-data failures. Cadence also claims such a capability, although Mentor says that the Cadence version is harder to piece together, and they expect to start taking some ICE share away from Cadence. Time will tell on that one. Get out your popcorn, folks; the show’s about to begin.
What Gives with Granularity?
With Cadence’s newest system, they’ve provided a very high level of granularity. Each of these boxes – especially with the data center versions – is likely to be used by more than one designer, and they have the capacity to emulate multiple designs at once. The granularity gets to the question, how big a chunk of compute power do you commandeer when you take ownership of some of this power?
If the minimum grain size is, say, capacity for 1 million gates, then, if you’re emulating only 50,000 gates, you’ve got a lot going to waste. Cadence claims that, with their smaller grain size, there is less waste.
No one is arguing that this isn’t true, but there’s also a sense amongst some that most users will be doing large, not small, designs, and so it becomes something of a secondary consideration. All other things being equal, yes, higher granularity (i.e., smaller grains) is a good thing, but if it requires a tradeoff in other features, many of those other features will be a higher priority.
The Chips are Down
Another differentiating factor is the chip used to do the logic emulation. These three machines do this very differently. Cadence uses a logic processor; Synopsys uses off-the-shelf Xilinx FPGAs, and Mentor has their own custom FPGA-like chip that uses look-up tables (like a standard FPGA without the standard bit). Cadence’s newest box involved not just the design of a new system, but also the design of a new processing chip.
There are a couple of implications of these choices. By using Xilinx, Synopsys tends to get access to leading-edge processes, since the big FPGA guys are typically early adopters of the latest process node.
But one downside to standard FPGAs is that they have limited signal visibility (and ways of addressing that have been part of the ongoing litigation). Both Cadence and Mentor have explicitly designed their chips to provide good signal visibility. Synopsys, by contrast, can’t control what Xilinx does in this regard and so has to make do.
There’s one other impact of the chip choice, and it impacts one component of performance…
How Fast Is Fast?
The more you can emulate in less time, the better off you are. That’s pretty obvious. But there are a couple of different considerations here, and they’re being packaged in ways that are creating some pushback.
The most obvious performance indicator is how fast your design can execute. No absolute numbers are available since there are no benchmarks, and this represents the very model of the notion of “it depends.” According to Lauro, Synopsys wins run time, with Mentor and Cadence being roughly equivalent to each other. Think on the order of 1 MHz vs. 10 MHz.
But there’s another component to speed, and it gets to what Cadence and, to some extent, Mentor, are referring to as throughput. This takes into account the time it takes to compile a design. In the simplest case, the claim is that a full run includes a compile and a run.
This makes a big difference, since compile times vary dramatically. And, while Synopsys wins the run time competition, it loses the compile-time battle because it invokes Xilinx’s place-and-route tools as part of the flow. Cadence wins the compile time battle; they claim a 2X advantage.
That said, Lauro notes that it’s not typical to do only one emulation run after each compilation. Something more like ten runs is more typical, which may run over several days. Throughput is still affected, but the compile time impact on throughput with ten runs is diluted as compared to one run. Compile and run times are both relevant and important, but combining them is more difficult.
Cadence also includes debug time in their notion of throughput, and this one feels less debatable. A delicate debug situation can play havoc with a schedule. This is a tool issue, but, at its root, it’s also the result of the visibility thing we mentioned. And Synopsys has a double challenge here: with less visibility, the risk increases that a recompile is necessary to bring out a key signal or two – and Synopsys has the longest compile times.
Here Comes the Power Play
Power consumption is one of the hardest things to suss out in this space. It’s widely suspected that Palladium requires more power than the other machines, but it’s hard to be sure (short of trying the machines out) without published power numbers.
Cadence does have a big edge in one aspect of holistic power accounting: power consumed while compiling a design. Cadence can do their compilation on a single server. Mentor and Synopsys tend to need farms of 10 to 50 servers, and, at 500 W each, that represents a big power difference.
Smothering with Service
There’s another area where Synopsys is working hard to neutralize an ease-of-use challenge. When you want to emulate a design, you can’t just toss the design files into the compiler and go. Porting designs from simulation to emulation is an unavoidable part of the effort required to include emulation in a verification flow. There are, for example, portions of the testbench that are synthesizable, and they would need to be segregated from the non-synthesizable portions for inclusion in the emulator “image.”
This prep work can be tedious, and it sounds like Synopsys has the toughest go of this. That said, they’re apparently absorbing that challenge by deploying teams of service folks to do the work for at least some customers.
And a Surprise Player
Finally, there’s a new kid on this block: Aldec. They’ve been around for a really long time, but their historical focus has been on verification tools for FPGAs. They’ve recently announced that they are now putting themselves out for full-on digital ASIC design-verification as well.
And they have a different twist on emulation. It’s based on their HES-7 prototyping boards, which are built from FPGAs. They overlay this with their HES-DVM software, which turns the prototype boards into an emulator with a SCE-MI interface. They have a backplane that, at present, can connect up to four boards, with each board handling up to 158 million gates. But they say that there’s no real limit to the number of boards that could be accommodated; they could design a bigger backplane if demand so indicated.
Unlimited scaling is a bold claim, so it remains to be seen what this system can handle under typical system loads. That said, it does represent a lower-cost option for companies concerned about the level of investment required for big-guy emulators (typically, millions of dollars).
And that’s a look at the current state of emulation. Lots of options fiercely competing with each other. Overall parity amongst the big three, but plusses and minuses for each one. And a new kid on the block, just to spice things up.
13 thoughts on “State of Emulation”
What emulation characteristics are most important to you?