From Simulation to Emulation

Last year I tried to wade through the world of emulation to untangle it a bit. It all seemed so simple at the time. Once I had it untangled, that is. Problem is, I only thought I had untangled it. Cadence recently announced a “unification” (there’s that word again) between simulation, simulation acceleration, and emulation. And it became pretty clear pretty quickly that, in the intervening year, a new tangled web has replaced the one I thought I had cleared out before.

Bottom line, I got confused. Again.

OK, perhaps that’s not such a rare occurrence, but work with me here…

As I started talking around, it felt like I wasn’t the only one confused. Although, in truth, you never think you’re confused until you realize that other people adhere to other beliefs with the same level of conviction that you adhere to yours. So everyone has more or less a clear opinion, it’s just that they don’t all align. And so I get confused. And so, in turn, I try my best to share my confusion with the others in an attempt to illustrate that there’s confusion.

And this time I didn’t get much pushback. Yup, folks: it’s confusing.

The real question here is, if these three items are being unified, what’s the distinction between them?

• Mentor’s Jim Kenney suggests that using real I/O instead of modeled I/O distinguishes emulation from simulation acceleration. Also, FPGA use means prototype; Cadence and Mentor use custom ASICs for emulation.

• OK… now we’ve tossed prototype systems into the mix too. Why untangle three things when four will do?

• And EVE ostensibly makes emulators, but they use FPGAs. Does that make them prototypes? Jim also mentions prototype systems being small. But EVE also builds large units. So which is it?

• EVE’s Lauro Rizzatti says the simulation acceleration/emulation distinction is pretty fuzzy. And that Cadence’s new marketing drawings look a lot like EVE’s old marketing drawings. And that Mentor’s custom ASIC is really a custom FPGA, while Cadence’s custom ASIC is based on logic processors (which we saw last year to be LUT-based and hence also somewhat FPGA-like).

• Cadence’s Michael Young says that if the verification environment involves a SCE-MI interface between the host and the hardware unit, then it’s simulation acceleration, not emulation. But, with no SCE-MI interface, you can’t talk to a hosted testbench (with any kind of speed), and only a subset of the testbench can be synthesized into hardware.

• Synopsys sells prototyping systems, not emulators, but they support an emulator use model.

Gah!

Of course, not everything everyone says conflicts, and there are threads of agreement here and there. So let’s try to walk through it and sort out what’s what.

Prototype this!*

And let’s start with prototyping, since we can put that one to rest somewhat more easily. Both prototyping systems and emulation systems – not to mention virtual prototype systems – allow software to execute on a model of what a chip will ultimately look like. The operative distinction appears to be, why do you want to do that?

If the answer is, “So that I can verify that my hardware is correct,” then you’re looking at an emulator. If the answer is, “So that my software developers can get started writing software earlier,” then you’re talking prototype.

A hardware prototype assumes that the RTL is relatively stable and is in the process of being implemented in silicon. Software development can start in advance of the chip being built and tested. A more abstract virtual prototype can be used before the RTL is stable; after that, a hardware prototype provides much greater performance. In fact, it runs much faster than an emulator – like an order of magnitude or more faster.

Prototypes tend to be smaller, using fewer FPGAs than, say, an EVE Zebu or Mentor Veloce (or Cadence Palladium if you think of their chips as FPGA-like) system. But they can run faster because someone takes the time and effort to create a very efficient implementation of the hardware. It makes sense to spend that much effort only once you know the RTL is pretty stable. The FPGA compilation process, not to mention performance closure, takes time, and so the whole exercise of implementing the prototype takes much longer than could be tolerated for a check-out of the hardware.

If you’re still testing the hardware RTL, then you want to be able to turn design iterations rapidly. This typically means less efficient implementation, using more FPGAs and achieving less speed. That’s where emulation sits. It’s all in the tradeoff.

And then there were three

So that leaves simulation, simulation acceleration, and emulation. To try to put them together into some reasonable structure, let’s pull things apart to make sure we know what’s going on.

The verification environment really consists of three elements:

• the design/device under test (DUT)

• the stimulus

• the checkers (or whatever decides that things did or didn’t work)

The stimulus and checkers are typically packaged together as the “testbench.” They’re typically written in Verilog or SystemVerilog and are generally not synthesizable as a whole (although a subset may be).

The DUT is typically written in either of the HDLs and is synthesizable. If it’s not, someone’s going to be in trouble at some point.

There are two places where each of these things can be:

• the host, using software models

• hardware

Let’s take the easy case as an example. Simulation takes place entirely in the host. So the DUT, the stimulus, and the checkers are all implemented as software models. No big mystery there. Problem is, they tend to run slowly. Especially when you start trying to see how your hardware works when executing software; it runs achingly slowly in a simulator. You could get old just waiting for the system to initialize; hopefully you’ve trained your kids by then to take over when the actual software runs.

And so you use hardware to accelerate things. Because the DUT is synthesizable and the testbench isn’t, the obvious first step is to move the DUT into hardware. In fact, you might even be tempted to move a part of the DUT into hardware. But we have an important consideration to take into account: the interface between the hardware and the simulator. Because, in this scenario, the simulator session still rules the land; it’s just including the hardware as one of its minions.

Back when this sort of thing was new, the stimulus in the host would stimulate the DUT in hardware one toggle at a time, and the response to the checkers would come back one toggle at a time. The DUT may have executed quickly, but getting signals to and from the DUT became the bottleneck.

Enter an approach pioneered by IKOS prior to their acquisition by Mentor: transactions instead of individual signal value changes. This approach lies at the heart of the SCE-MI interface connecting the host and the hardware DUT. This dramatically accelerates the testbench’s ability to control the DUT.

But what if you split the DUT up, accelerating only a portion of it? Well, assuming that portion has to talk to the other unaccelerated portion, now you’re back to having to connect those signals across the bundle of wires between the host and the hardware. And there’s nothing to pack the signal changes into transactions. So this takes us back to achingly slow.

Meaning that the DUT will pretty much be found in its entirety either on the host or in hardware. The latter being the accelerated version.

So where does that leave emulation? Well, we still have this pesky tether to the host. Both Cadence and Mentor seem to agree that, in their view, an emulator stands on its own. It doesn’t talk to a host. There are two ways to achieve that: either put as much of the testbench as possible into hardware, or don’t use a Verilog testbench – use a “real” testbench.

Let’s break the testbench back apart into stimulus and checker portions. To stimulate your DUT (illegal in some states), you could have some signals coming from a synthesized partial-testbench, or, even better, use actual I/O cards that look like what will actually happen in the final system. If it’s network traffic you need, hook the DUT up to a real network through a real network card rather than using a simulated traffic generator. Need to talk to something that sits on the PCI bus? Use a real PCI card to talk to the real thing that’s sitting on the real PCI bus.

What about the checker? Well… you may not have one. After all, you won’t have one in the final system. How will you know if the final system works? Because… well, it works. If it doesn’t work, then that means that… it didn’t work. You know, blue screen of death, that sort of thing. (Hopefully while Bill is presenting… career-limiting maybe, but oh so worth it… the only tech story the grandkids will want to hear over and over…)

So, essentially, by this measure, emulation means setting your baby free, cutting the cord, sink or swim; you get the picture. If it sinks, then you’ve got problems. Oh, and yeah, then you need to find out what those problems are. And all of the big emulation systems have ways of capturing much more of the internal state than would be possible on a real system; for the large systems, that wealth of data can be examined in situ. If you wish, you could ship the state of the system back across to the host (oh, yeah, there’s that cord again… maybe it didn’t quite get completely cut…) for analysis or for comparison by simulation so that you can isolate what went awry, but, being the staunch macho isolationists that they are, the emulators say they can do that just fine without any help. (And no, they don’t need directions; they know exactly where they are.)

Note that, even though you’re running software on the emulator here, just like you do on a prototype, the intent is not to develop more software on a more-or-less known-good hardware model, but to verify that software and hardware work well together. If there’s a problem, it could be a software or hardware issue. Not until both work can it be called good.

So what does unification mean?

Having surveyed and mapped out the landscape, let’s move back to the opening gambit: the once-discrete realms of simulation, simulation acceleration, and emulation have now been united into a single flow. What does that mean?

It should mean two things:

• control of verification comes from a single console running a single session regardless of where DUT or testbench are running

• the, what should we call it, “locus of instantiation”? (yeah, good and pretentious… listen up for it at a keynote near you) of DUT and testbench should be manageable within that session.

This means that, from a single window, you could run your verification, moving things into and out of hardware at will. Cadence does claim to be able to move the DUT back and forth from hardware to software with a single command-line instruction. OK, they’re actually not moving them back and forth – there is a soft image and a hardware image of the DUT both in place, and the state moves back and forth between them. They call this hot swapping.

That connects simulation with simulation acceleration. What about connecting to emulation? That means being able to hot swap the testbench as well. Apparently the synthesizable part of the testbench can be swapped, as can I/O modules (models for actual) in some cases.

Here’s the other big question: let’s say you can do this all seamlessly. Is there a use model that benefits from being able to swap all this stuff around like this? Or is it enough to be able to start with simulation until that gets too encumbered, then move it to an accelerator until you think it’s good, then move it to emulation? One transition each time. Does moving back and forth have value?

Presumably, customers will be the ones to answer that.

More information:

Cadence Palladium XP

EVE Zebu

Mentor Veloce

Synopsys Confirma

*Full disclosure: this is a shameless riff on the name of a Discovery Channel program that unfortunately didn’t last too long… not to mention the title of last-year’s article… sorry… I promise no more links to that dang article…