editor's blog
Subscribe Now

Faster Simulation on GPUs

At last week’s SNUG, I had a chat with Uri Tal, CEO of startup Rocketick, about their simulation acceleration technology. What they do bears some resemblance to the parallelization semi-automation done by Vector Fabrics or the exploration done by CriticalBlue, except that here it’s working with Verilog instead of C and it’s fully automated and transparent to the user. He claims they can accelerate simulation by over 10X.

They use a GPU to achieve this kind of parallelization. This has promise both for in-house simulation farms and cloud-based simulation, where GPUs are available (although the cloud hasn’t been their focus).

What they do is create a directed flow graph (DFG) from the Verilog code and then go through and figure out which parts they can accelerate. Each such part becomes its own thread for the GPU. The acceleratable parts tend to be the synthesizable portions of the code (as hardware logic tends to be highly parallel). They do this on a statement-by-statement basis while keeping an eye on the dependencies – if there are too many dependencies, they may change the partition to reduce the size of the dependency cutset. What is left unaccelerated either couldn’t be accelerated or simply didn’t make sense to accelerate.

So, based on this, the tool converts a completely unaccelerated simulation into portions that are set aside for the GPU and the remaining bits that are re-generated for standard simulation. The accelerated portion is attached to the simulation using PLI.

The accelerated threads are turned into a byte code that is executed by a run-time engine. This makes the accelerated “code” portable onto any platform; only the runtime engine must be ported. They also manage memory carefully: the GPU uses very wide-word memory, so random byte accesses can be very inefficient; they manage the memory on a per-thread basis to get as much as possible out of each memory read (or write).

The accelerated threads dump all the usual files for later analysis by viewers and debuggers. They interface directly with SpringSoft’s Siloti to identify “essential” signals.

You can find more on their website.

Leave a Reply

featured blogs
Mar 26, 2019
It's CDNLive! Well, not today, Tuesday and Wednesday, April 2nd and 3rd at the Santa Clara Convention Center. So I have eight things you can do to get the most out of CDNLive and go home with a... [[ Click on the title to access the full blog on the Cadence Community si...
Mar 25, 2019
Do you ever use the same constraint templates in multiple projects? Now, with PADS Professional VX.2.5, you can easily import and export constraints from one project to the next. Constraint templates enable application of complex rules to multiple nets. They help ensure a smo...
Mar 22, 2019
In the video above, it might not appear that much is taking place, but just like with transformers there is “more than meets the eye.” Alright, that was corny, and I am mildly ashamed, but Nanosecond Event Detection for shock and vibration is nothing to be ashamed...
Jan 25, 2019
Let'€™s face it: We'€™re addicted to SRAM. It'€™s big, it'€™s power-hungry, but it'€™s fast. And no matter how much we complain about it, we still use it. Because we don'€™t have anything better in the mainstream yet. We'€™ve looked at attempts to improve conven...