feature article
Subscribe Now

A Brave New World of Emulation and Software Prototyping

Like so many of the technologies we take for granted today, I managed to find myself embroiled in the very early days of hardware emulation. This refers to the process of imitating the behavior of one piece of hardware (typically a silicon chip you are in the process of designing) with another piece of hardware (typically a special-purpose emulation system).

For the purposes of these discussions, I’m going to throw the term application-specific integrated circuit (ASIC) around with gusto and abandon. However, everything I say in this column is equally applicable to application-specific standard parts (ASSPs) and system-on-chip (SoC) devices. (If you are a bit “fluffy” as to the nuances of ASIC, ASSP, and SoC nomenclature, may I be so bold as to make mention of my book Bebop to the Boolean Boogie, which explains everything in excruciating exhilarating detail).

There are two main “reasons for being” for emulation. The first is that when we are developing a great big hairy ASIC, it’s more than embarrassing if we build it and it fails to perform as planned (trust me, it’s frowny faces all round on such an inglorious day). If we have a real chip in our hands, we can apply stimulus to its inputs and monitor its outputs and say “Yay” or “Nay.” What we need is a way to do this before we have built the chip, only to discover that the answer is “Nay.”

The first solution we started out with circa the 1970s was software simulation. This is where we use a register transfer level (RTL) representation of the device to build a virtual model of the chip in a computer’s memory, and then we apply virtual stimulus to the virtual inputs and observe the responses on the virtual outputs. The problem back in the day was that the computers available to perform the simulations were pathetically poor performance-wise compared to today’s offerings. The fact that computers grew steadily more powerful didn’t help because—at the same time—the designs we were simulating grew steadily larger.

We still use software simulation today because it provides unmatched visibility into the design. However, we predominantly use it to intensively evaluate relatively small portions of the entire device in relative isolation.

One alternative to software simulation, which started to appear circa the mid-1980s, was to create hardware emulators, all based on arrays of chips. What chips? Well, some used off-the-shelf CPUs, some used off-the-shelf FPGAs, and some used custom creations (CPUs, ASICs, FPGAs, and even… but let’s not go there).

These emulators accept the same RTL representation of the design as the software simulator, but this representation is mapped onto the emulator’s processing elements. The initial role of these hardware emulators was to provide simulation acceleration.

The second main reason for emulation is to start developing and verifying software as early as possible in the development cycle. Today’s markets are moving quickly, and you can’t wait until you’ve built your ASIC before you start creating the software to run on it. The answer to this conundrum is to use an emulator to prototype your software on the virtual representation of the hardware—from low-level firmware to embedded software to high-level application software—far in advance or the real-world hardware in the form of your ASIC becoming physically available.

Software simulators are cheap (relatively speaking). Emulators aren’t, or they are, all depending on your point of view. If your emulator allows you to detect and fix problems in your design and get your ASIC out of the door the first time and on time, then it’s cheap when compared to the alternative.

There are a lot of factors in play. For example, we are now looking at chips containing billions of equivalent logic gates. We don’t just simulate/emulate to verify functionality and performance; we also need to evaluate our designs in the context of power consumption. New design starts for purpose-built SoCs and artificial intelligence (AI) accelerators are ramping up at an extraordinary rate. And, on top of this, software now largely defines the product, which means you need to get the software as soon as possible, and you need to get it right.

So, who offers the best emulators? That’s not for me to say (not if I want to keep all my friends). What I can say is that I was just chatting with Jean-Marie Brunet, who is more than confident that he and his colleagues are “kings of the emulation castle,” as it were. Jean-Marie is VP and GM of HAV at Siemens (I had to look at that twice. VP = Vice President, GM = General Manager, and HAV = Hardware Assisted Verification).

The chaps and chapesses at Siemens have identified a 3-tier solution space to address the verification and validation of complex ASICs and systems:

  • Emulation: Fast and deterministic compilation for design bring-up and iteration. Full visibility for fast debug and system-level power-and-performance (PnP) analysis.
  • Enterprise Prototyping: Early firmware and embedded software validation. Fast, congruent transition from emulation. 10x higher throughput per $ than emulation.
  • Software Prototyping: Early system software validation. At-speed interface and IP verification. Extreme flexibility, enabling the highest performance at the lowest cost.

To address this tiered solution space, the guys and gals at Siemens have just announced not one, not two, but three new families of congruent emulation solutions that bring tears of joy to my eyes.

Meet the three Veloce CS solutions (Source: Siemens)

The first thing to note is that I have no idea what the “CS” portion of these monikers stands for. I can’t believe I didn’t ask. Maybe it’s “Chip Speed” (maybe not), but we digress…

The next point to note is that the Veloce Strato CS platform is based on Siemen’s new purpose-built CrystalX chip, while both the Veloce Primo CS and Veloce proFPGA CS platforms are based on AMD’s latest and greatest VP1902 Adaptive SoC FPGA.

As compared to the previous Veloce Strato+ (2021), the Veloce Strato CS (2024) provides 4x the gate capacity, 5x the performance, and 5x the debug throughput. Similarly, as compared to the previous Veloce Primo (2021), the Veloce Primo CS (2024) provides 4x the gate capacity, 5x the performance, and 50x the debug throughput (yes, the 50x in this case is not a finger-slip on my part).

Another big point is that, as opposed to the custom cabinets of the 2021 models, both the Strato CS and Primo CS are presented as server blades, thereby supporting super scaling. In the case of the Strato CS, a single blade can be used to emulate ~170M gates.

Scaling with Veloce Strato CS (Source: Siemens)

The next step up is four of these blades, plus an interconnect blade, forming a 5-blade module. Next, we have a tower containing four modules capable of emulating ~3B gates. Four of these towers can emulate ~12B gates, while 16 towers can emulate ~40+B gates.

By comparison, In the case of the Primo CS, a single blade can be used to emulate ~500M gates, while a 5-blade module (4 Primo CS blades plus 1 interconnect blade) can be used to emulate ~2B gates.

Scaling with Veloce Primo CS (Source: Siemens)

A 4-module Primo CS tower is capable of emulating ~8B gates, while six of these towers can emulate ~40+B gates (the same number of gates as 16 Strato CS towers).

Earlier, I was throwing the term “congruent” around. What does this mean? Well, the dictionary definition suggests corresponding, consistent, matching, compatible, and harmonious. Oh, you don’t mean “What does the word ‘congruent’ mean?” You mean “What does congruent mean in this context?” You need to learn to articulate your questions better.

How about we try this in a sentence, like: “Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering.” Hmmm. Perhaps a better way to explain this is by means of another illustration as shown below.

Veloce Strato CS and Veloce Primo CS provide a fully congruent HW/SW offering (Source: Siemens)

As we see, the two solutions employ common RTL compiler, synthesis, run time, and debug engines. The only differences in the flow are the place-and-route (PnR) tools used to map the design onto the diverse devices employed by the two platforms.

Both Strato CS and Primo CS allow users to save the current state of the emulation and restore that state later. This can be extremely efficacious if you are performing multi-hour or multi-day emulations. One thing on the roadmap is to provide the ability to save the state of an emulation on Primo CS and restore that state on Strato CS.

Why? Well, suppose you are running a multi-day emulation on a Primo CS and a bug is found on Day 3. The Strato CS has much higher visibility into the design, but do you really want to start the emulation from scratch on the Strato CS and then wait anywhere from 9 to 15 days to reach the problem point in the emulation? “No!” I cry, “One thousand times no!” But suppose you had instructed the Primo CS emulator to save its state once every two hours, for example. In this case, once the folks at Siemens make this feature available, you will be able to take the saved state from the Primo CS prior to the problem and restore that state on the Strato CS. Brilliant!

And let’s not forget the Veloce proFPGA CS (2024), which offers 2x the gate capacity, 2x the performance, and 50x the debug throughput of its Veloce proFPGA (2021) predecessor. Jean-Marie informs me that this system offers the lowest cost of entry on the market, all the way from a single-FPGA desktop board to a multi-blade rack system.

Transforming hardware and software for software prototyping (Source: Siemens)

This is where the Veloce operating system for prototyping (VPS) software comes into play to accelerate bring up. VPS features efficient compilation without requiring any modifications to your RTL. It also boasts automated multi-FPGA partitioning, timing-driven performance optimization, and sophisticated at-speed debug.

And one more thing lest I forget, all these new Veloce platforms—Strato CS, Primo CS, and proFOGA CS—support multiple users working on heterogeneous designs simultaneously. For example, if you have a 40B gate Strato CS system, then two groups can be working on different 20B gate designs (or a 30B gate and a 10B gate design, etc.), four groups can be working on different 10B designs, and… you see what I mean.

I’m thinking of the simple desktop hardware emulator the company I worked for designed back in the mid-1980s. If I could get my time machine working, travel back, and tell my friends what the emulation future held, I know exactly what they would say: “What happened to you? Where did all your hair go? How did you get to be so old? You only popped out to get a sandwich!”

Hmmm. Let’s leave my erstwhile friends having all the fun that was to be found in the 1980s and return to the present. What do you think of everything you’ve read here?

Leave a Reply

featured blogs
Apr 26, 2024
LEGO ® is the world's most famous toy brand. The experience of playing with these toys has endured over the years because of the innumerable possibilities they allow us: from simple textbook models to wherever our imagination might take us. We have always been driven by ...
Apr 26, 2024
Biological-inspired developments result in LEDs that are 55% brighter, but 55% brighter than what?...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

ROHM Automotive Intelligent Power Device (IPD)
Modern automotive applications require a variety of circuit protections and functions to safeguard against short circuit conditions. In this episode of Chalk Talk, Amelia Dalton and Nick Ikuta from ROHM Semiconductor investigate the details of ROHM’s Automotive Intelligent Power Device, the role that ??adjustable OCP circuit and adjustable OCP mask time plays in this solution, and the benefits that ROHM’s Automotive Intelligent Power Device can bring to your next design.
Feb 1, 2024
11,657 views