Walking the (Heterogeneous Multiprocessing) Talk

For years now, marketing folks at companies who make things like GPUs and FPGAs have been painting a beautiful picture of a gleaming future – a future where dogs and cats get along, unicorns frolic on the lawn, and accelerated computing brings orders of magnitude improvements in computational throughput and, particularly, performance-per-watt. It’s a grand story, and the only thing that’s kept it from becoming reality is the minor challenge of finding hardware engineers to re-write all of the software in the world in RTL (or something like that).

Of course, that vision has not yet become reality, although it is (we are assured) just around the corner, owing to new, improved tools that make programming heterogeneous multiprocessing systems a breeze. Well, maybe not a “breeze” exactly, but pretty darn simple. Here – watch this demo – even our marketing VP can take these fifty lines of C code and whip them into an optimized, accelerated fury in just minutes, with just a couple of mouse clicks.

OK, then, FPGA companies. Let’s see you do this on your own EDA tools. You know – the ones that take hours and often days to complete a single run through simulation, synthesis, place-and-route, and whatever other incantations we need to perform in order to get our “code” to become working hardware. When will that software be included in this grand vision for the future of computing?

Crickets…

Yeah, don’t hold your breath for that to happen anytime soon.

This brings up the perfectly reasonable question: “Why not?” It seems like EDA software should be the poster child for acceleration, right? Critical, complex software applications that are a huge bottleneck for high-value engineering teams. Massive operations on enormous data models. Doesn’t this seem like exactly the kind of thing that heterogeneous computing with FPGAs or GPUs is supposed to solve?

It turns out, however, that the obstacles to accelerating EDA tools are substantial and diverse. First, there are economic challenges. Even though the marketing VP can use an automated tool to convert a few dozen lines of code into optimized hardware/software goo, that process doesn’t scale for the hundreds of millions of lines of code involved in the EDA ecosystem. And a million marketing VPs standing at a million demo stations clicking a million mouse buttons will never produce a hardware implementation of RTL synthesis.

Porting something as large as an EDA tool suite (even just the extremely performance-critical bits) would be a monumental and very expensive effort. Given the comparatively small revenue behind EDA, it would be difficult to make an economic case for the effort. Would tool companies immediately get substantially more revenue? Unlikely. Sure, it would be a boon for users, but it wouldn’t drive big incremental sales and profits to EDA’s bottom line. In fact, it would probably be a giant expense with very little return.

On top of that, users of EDA software (Yep, that’s all of us – people who are reading this), don’t really want to replace our current computing infrastructure with new, proprietary boxes jammed full of expensive FPGAs or GPUs along with the current Xeons. We’d like to be able to run our software on our plain-old generic hardware, thank you very much. Would it make economic sense for a company to buy proprietary hardware to run EDA tools? Maybe, but it would probably not make economic sense for anyone to build that hardware.

Of course, EDA companies (and we are also including FPGA companies in this category, by the way) could theoretically provide accelerated tool solutions in the cloud, running on their own special super-accelerated servers. OK, who wants to do their proprietary engineering work in the cloud? Hello? Anyone? Still with us? Yeah, that’s what we thought.

Beyond these economic and logistical issues, there are serious technical barriers to accelerating much EDA code. Today’s EDA tools typically operate on enormous in-memory data models. Performance is often limited less by how much processing can be parallelized and more by concurrent access to data models. Yes, EDA companies have at least worked in recent years to make it possible for multiple (conventional) servers to work some problems in parallel, but those implementations quickly run into diminishing returns as the number of servers is scaled. Similar obstacles stand in the way of porting code to run on hardware-accelerated architectures. It isn’t just the instruction-crunching speed that’s the challenge.

Another deep, dark secret that makes EDA software difficult to accelerate is the unfortunate reality that there is a large amount of code in today’s EDA tools that, (sit down before you read this) nobody understands. Yep. A typical EDA tool today is made up in part of software that was written a long time ago in a startup or university far, far away. The engineer or grad student who labored for years to get the innermost loops of those critical routines optimized for speed and functionality has long since retired, gone to work for Facebook or Google, or otherwise moved on to greener pastures. The current engineers at EDA companies treat these areas of code as “black boxes,” generally fearing to crack open the hood for fear of disrupting some subtle, incomprehensible bit of black magic that makes the whole thing tick. Often, these would be the very routines that require re-implementation for acceleration. Caveat coder.

EDA isn’t a stranger to acceleration, however. Today, for example, EDA companies sell emulators that accelerate the verification process orders of magnitude beyond what’s possible with RTL simulation. But they accomplish this by basically doing away with the “simulator” entirely and implementing the RTL directly on FPGA-like native hardware. And, in the old days, several EDA companies sold specialized acceleration machines, which were really just glorified souped-up regular workstations, specifically tuned for accelerating their (primarily simulation) tools. These ultimately failed, however, as it was too much work for the EDA companies do develop custom computers at a pace that kept up with the rate of progress in conventional, general-purpose computing hardware. EDA Accelerators were obsolete almost as soon as they were released.

That all being said, it’s still possible that we will see design tools take advantage of the latest and greatest hardware acceleration technology someday. And, when it does, that may be the sign that acceleration technology is truly becoming useful to the mainstream, rather than to a few high-budget, massively-scaled “killer apps” required by the “Super Seven” server companies. It will be interesting to watch.

Walking the (Heterogeneous Multiprocessing) Talk

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk