Trans-Acting Lessons

It is not a notable occurrence for me to find myself confused at any given moment on any given topic. However, finding that I’m not the only one confused – well, that pretty much makes it a red-letter day. Within the world of SoC verification, there are numerous points of potential confusion, and I’m finding much satisfying solidarity with other folks trying to navigate the space.

Part of the problem arises from terminology and semantics. For succinctness’ sake, terms are given very specific meanings. The cognoscenti use such terms widely (sometimes in a demonstration of superior knowledge). We outsiders can get a sense of what the meaning is if the term was well-chosen but will probably be missing some specific implications. On top of that, some terms just become popular and commercially useful; everyone tries to attach themselves to such a term, whether accurate or not. After VMware went public, for example, suddenly anyone making software was doing virtualization, rendering the term virtually meaningless.

The realm we’ll delve into here is that of simulation and modeling. The technical challenge is that designers simulate at different stages of system implementation, and so may have either a very abstract or very specific representation – or something in between – of what is being simulated. In addition, simulation at an extremely detailed level can be excruciatingly slow, so, where possible, abstraction speeds things up.

Add to this the fact that a simulation environment will contain various components, and these components might have different levels of abstraction, and you can end up with a complicated scenario. Zooming out from the details, the big picture is usually split into two portions: there’s the logic you’re trying to simulate and test, often referred to as the “DUT”, or Device Under Test (imported from the test world where an actual device was being tested), and then there’s the testbench – which, continuing the analogy from the test world, would be equivalent to the tester. In other words, the tester/testbench provides a test scheme and access to and from the DUT; the DUT is plugged into the tester/testbench and is exercised. That much is pretty straightforward.

The analogy fails in that the test world has a convenient pool of test engineers who know how to program and manipulate the exceedingly complex testers, and for the most part, the testers are purchased (with options) from a company that builds them, so that you don’t have to worry about how the tester was put together. No such luck in the simulation world. You get to be the designer and tester and tester-builder. You have to assemble the testbench and the test scheme and implement it. Yeah, there are specialized verification engineers, but the lines blur.

So let’s bore (in the drilling sense) a bit more into the testbench side of things. In the looking around I’ve done, I haven’t actually seen things described the way I’m going to, but it’s how it makes sense to me; hopefully I’m not alone. Based on the different views that appear to exist, some of this could be debated; that debate is probably useful.

In order to put together a useful testbench, you need:

one or more sources of stimulus – a fancy word for input data;
models for other items that may interact with the DUT;
something that can record results;
possibly some sort of controller that directs the sequence of events;
and a way to hook it all together and let these elements talk to each other and the DUT.

All of these, by virtue of being virtual elements in a simulation, are software models of one sort or another. And, in fact, they probably come from different sources and were probably written in different languages. So they don’t necessarily just snap together smoothly. So in order to get these things talking together, their interaction can be abstracted as a sequence of transactions, and the thing that holds them all together – the last item in the list – is usually referred to as a “transactor.” Models that interact with the transactor via transactions are “transaction-level models,” as distinguished from, for instance, RTL models or gate-level models.

This term, “transactor,” does appear to have some variations in specific implications. In its simplest incarnation, it is essentially a bus. For example, a transactor may implement a PCI bus, and all the other models – including the DUT – are connected to it and can interact via the PCI protocol. In such a situation, the transactor is passive in that it executes transactions at the request of elements hooked up to it. Other descriptions of transactors ascribe to them a more active role as a controller, executing a set of transactions specified by a file or script written by the designer.

Abstraction and accuracy

The concept of a transaction brings us to a discussion of abstraction and accuracy. The whole idea of a transaction is that it can simplify some unnecessary details to speed things up. For example, let’s assume that a DUT is intended to interact with a PCI bus to receive network data from some source. This means you will need a model connected to the transactor that acts like a network data source. It may be important for validation purposes to make sure that the DUT interacts correctly with the PCI interface, but it doesn’t matter at all that the data source does. So on the DUT side of the transactor, it may be useful to implement the specific PCI interface, but on the data-source side, a much simpler way of delivering data can simplify and speed things up. This concept of transaction-level interaction not only allows faster simulation, but also allows simulation at earlier stages of the design, when the detailed protocols aren’t yet known.

In this role, the transactor can act like a so-called “gasket” between incompatible models. Heck, if your legacy data source model was written to talk to Ethernet rather than PCI, and you don’t want to dust off the old moldy code in the model, you can interface via the transactor. This allows models of varying protocols and levels of abstraction to be interconnected, with the transactor bridging the gap between the interfaces. Of course, this isn’t an automatic process – just because you don’t want to break into the model code doesn’t free you from work – you still might have to implement it in the transactor.

Inherent in the discussion of abstraction is the notion of accuracy, and there are some terms here that appear to cause confusion. Models are usually classified either as cycle-accurate or not – i.e., untimed. There actually is one more intermediate distinction, bus-cycle-accurate.

A cycle-accurate model is supposed to provide behavior that is exactly like that of the final circuit, on a cycle-for-cycle basis. This can be very useful for a DUT that is far down the implementation path, say at the RTL level. It may not be important at all for other elements of the model. A data source, for example, can be untimed and provide data. But if the DUT reflects a subsystem within the SoC and is being tested alongside another portion of the SoC with which it must interact, then having that other model be cycle-accurate may provide greater assurance that the pieces of the SoC will interact cleanly once implemented in silicon.

At an earlier stage of the design, when the details of the DUT haven’t been worked out, the model may be bus-cycle-accurate in that it interacts correctly with the transactor on a cycle-for-cycle basis, but internally, it takes some shortcuts. For example, if it’s a memory model, once a write request has been accepted on a cycle-accurate basis from the bus, then the write operation in the memory may be modeled as a single cycle, whereas in the final full-cycle-accurate model, it may take more than one cycle to complete the write operation.

It appears from some discussions that the concept of cycle accuracy can be handled a bit loosely, with models being “near-cycle-accurate” (sometimes conveniently omitting the word “near”), or with other variations. In other words, there are people that don’t take claims of cycle accuracy at face value.

The other term that causes confusion here is that of a bus-functional model, or BFM. Some say that this means that the model is fully cycle-accurate; others think of it in terms of a model that’s bus-cycle-accurate. Some see the BFM concept as simply one of implementation: any element hooked up to the transactor could be implemented as a BFM; others see it as allowing for the non-DUT ancillary functions such as that of a data source. We’ll leave that to be decided out by the bleachers at 3:15 this afternoon; be there or be square.

Taking it to hardware

The scenarios described above all apply to simulation environments where everything is being carried out in software. Abstraction in this case is very important to keep run-times down. But another means of accelerating cycle-accurate simulation is to use hardware emulation. Emulators have been around for a while, and they often make use of FPGAs to allow the programming of a variety of functions in advance of hard silicon being available. But we’ve now got an added level of complexity: part of our test environment consists of software and simulation, and part consists of hardware and emulation.

In order to standardize the interface between the hardware portion and the software portion, the SCE-MI interface was defined. This allows the software side to execute a function call to the emulator, and for the emulator to receive that and respond accordingly; the reverse case is of course also possible. The details of how this function call is turned into a hardware invocation are made opaque so that you don’t have to worry about those details. It’s essentially a message-passing scheme whereby messages flow between the software and hardware side.

All of this sets up two specific problems that an engineer will need to solve when putting together a test environment. One is that of taking some RTL logic and creating a model that can attach to a specific transactor. The other is that of creating a transactor.

Carbon Design Systems provides tools specifically for automatically turning RTL into fully cycle-accurate models. The tools are intended to ease the process of model generation and validation. Because the full RTL is essentially encapsulated in the model, the model behaves in a fashion that is cycle-for-cycle and node-for-node (so to speak) the same as the RTL. The node-for-node concept doesn’t matter from the standpoint of the bus interface, but it does matter if you’re trying to debug the model itself. By providing visibility into the model, it becomes easier to isolate and correct problems. They have libraries for the most popular transactors, eliminating much of the work of interfacing to the rest of the testbench if one of those transactors is used. The software model is wrapped to adopt the interface of the selected transactor.

Carbon has just announced a new version of their Carbon Model Studio product, which largely focuses on improving model validation to speed up the process. Whereas, prior to this release, model generation typically took from a few hours to three or four weeks, Carbon’s expectation is that any model should now be doable in less than two weeks, and, once one is familiar with the methodology, from hours to less than two days.

Given that the model is automatically derived from RTL, a natural question would be, why would there be any difference between the model and the RTL? Why is validation even necessary? Wouldn’t that reflect a bug in the model generation process? Not necessarily. There are two possible sources of errors. One is the fact that it is possible to have race conditions in Verilog that may simulate differently at the RTL level than they do in the model. The more likely source is the use of non-synthesizable items like PLLs and memories. In either case, the model would need to be tweaked to fix it. The model can be wrapped to interface to the RTL simulation environment so the pure RTL simulation results can be compared to the model executing in that same environment.

On the emulator side of things, EVE provides emulation hardware. Actually, to be a bit more precise, they refer to it as hardware/software co-verification, the idea being that because part of the test environment is in software, the software and hardware jointly perform the verification function. Their system uses FPGAs but is partitioned such that the DUT logic gets its own set of FPGAs. The transactor, which can be built into hardware, is put into a separate FPGA. In fact, more than one transactor can be put into that FPGA. This means that changes to the DUT logic and changes to the transactor don’t interfere with each other. The challenge now becomes to create a transactor, and then to get it to interface with the software side of things.

The standard way of creating the transactor is through Verilog, which can be time-consuming. EVE has announced the ZEMI-3 tool for creating transactors using SystemVerilog, which allows much greater abstraction. This brings two things to the party: first, a faster way of specifying the transactor behavior. Second, EVE has created a proprietary means of interfacing between the software and hardware sides that uses elements of, but is simpler than, the SCE-MI standard (hence the name ZEMI – it appears all their product names start with Z).

Once the transactor has been defined, it is compiled. This compilation creates an RTL implementation of the transactor functionality and the interface via ZEMI for implementation in the FPGA. It also determines which SystemVerilog functions can be invoked from the software side and which software functions can be invoked from the SystemVerilog side, creating library code to be linked to the software to complete the software-hardware connection. This allows software to execute a function call and actually have that function executed by the hardware emulator, and vice versa.

Whew, that’s a lot of words just to get to the point of explaining what’s happening with a couple new releases, but I just had to get myself disentangled from some of my own misconceptions and confusions and reassemble a worldview that made sense to me. I suspect others may see things slightly differently, either due to subtleties lost in this simplification or due to commercial considerations that may favor a different spin. So even if things aren’t completely untangled, hopefully they’re less tangled. There are other areas of verification that merit some untangling; that’s for a different article.