feature article
Subscribe Now

C to FPGA

Who’ll Use the Next Generation of Design Tools?

The von Neumann architecture is a miracle of efficiency if you count the algorithmic complexity that can be completed by any given number of transistors.  If you’ve got enough transistors to create a 32-bit processor plus peripherals plus enough memory to store a decent size program, you can execute an enormously complicated algorithm. 

Where von Neumann isn’t so efficient is in the amount of computation for a given amount of power, or in the number of computations in a given amount of time.  Those battles are won handily by custom, parallel hardware like we might create in an FPGA, or in a custom, algorithm-specific block in an ASIC or custom SoC.  Optimized hardware that specifically implements our algorithm will always win in terms of speed and power – at a cost of vastly increased transistor count.

Throw these two abstract realities on the backdrop of Moore’s Law, and you can see what happens.  Every couple of years, the cost of transistors drops approximately in half.  We can get double the transistor count on the same size piece of silicon, so the size and complexity of algorithm that could be implemented in parallel hardware doubles.  A few years ago, if we had a complex operation represented by a bunch of software, we could afford to take only a small, critical function or two out of that operation and implement them in hardware.  With each passing process node, however, the number of transistors available for hardware implementation doubles, and so does the amount and complexity of what would have been software but can now be hardware. 

Of course, gaining the benefits of moving software into hardware costs something more than just a few orders of magnitude more transistors. It costs design time and effort.  Overall, when we put custom hardware implementation in a balance scale, on the plus side of the scale, we have enormous gains in performance and power efficiency.  On the minus side, we have orders of magnitude more transistors/cost, significantly higher design effort, and less system flexibility. 

As we mentioned – Moore’s Law is constantly making the first item on the negative side better.  To address the last two items, we have high-level synthesis (HLS) plus FPGA fabric.  The myth and the goal of HLS is that we can take our software algorithm, run it though our magic high-level synthesis tool, and out pops an optimized, parallelized, super-efficient hardware implementation of that algorithm that we can plop down in an FPGA.  That magic C-to-hardware transformation is what HLS has been promising for more than two decades.  

If you ask a panel of experts (which I have done on several occasions,) you will find opinions ranging from “We can do it today!” to “It will never happen.”  Why the range of answers?  On the plus side of the scale (our scale is getting a workout today, isn’t it?) there are several tools in production use today that can take untimed algorithms written in carefully constructed C or C++ and turn them almost magically into high-quality synthesizable RTL.  We have written about this many times before, of course, and we’ve even written about BDTi’s benchmarking and certification program where they set about proving it. 

Those on the “it will never happen” side of the scale, however, are quick to point out that this is not the mythical beast of software transformed magically into hardware by some omnipotent compiler.  These tools require significant hardware expertise on the part of the user.  One must understand concepts like pipelining, loop unrolling, latency, throughput, fixed-point math, quantization, resource sharing, and other hardware-centric concepts in order to write the code, control the tools, and understand the results.  

The “we can do it today” crowd seems to get more nearly correct with each passing year.  Every year, we see new tools on the market, significantly more design experience with the old tools, and improved results reported by those using HLS in production.  The tools also seem subjectively to be less sensitive to coding style in the original C/C++ – they now support various dialects from custom languages that use C-like syntax to ANSI C/C++ to SystemC. 

The “It will never happen” folks also make a compelling point, however.  If we are expecting C-to-FPGA to ever behave like a software compiler, we’re overlooking an important fact about the difference between hardware and software.  For a software compiler, there is always something that could be agreed upon as a “best” solution.  Compiler developers can tune away – trying to minimize the size and maximize the speed of the generated code.  The right answer is reasonably easy to quantify.  Optimization choices made during software compilation have at best a modest effect on the results.  For the sake of argument, maybe zero to 20% plus or minus. 

In hardware architecture, however, there is a gigantic range of answers.  The fastest solution might take 1000x the amount of hardware to implement as the densest one.  The lowest power version might run at a tiny fraction of the maximum speed.  The size of the design space one can explore in HLS is enormous.  Implementing a simple datapath algorithm in an FPGA, for example, one might choose to use a single hardware multiplier/DSP block for maximum area efficiency – or one might have the datapath use every single available DSP block on the chip – which can now range into the thousands.  The cost/performance tradeoff available to the user, then, could be in the range of three orders of magnitude.  The “best” answer depends on the user’s knowledge of the true design goals, and how those goals map down to the particular piece of hardware being implemented with HLS.  Unless the user has a way to express those design goals and constraints and percolate those down into the detailed levels of the design hierarchy, an HLS tool has almost zero chance of guessing the right answer.  It is NOT like a software compiler.

For years, the challenge users threw down to HLS providers was “results must be as good as hand-coded RTL.”  This is a worthy goal, and reminiscent of what the hand-assembly crowd expected of the software compilers trying to woo them into high-level languages.  However, many HLS tools have now achieved and surpassed that goal.  In numerous production reports, HLS tools have delivered results equal or superior to hand-coded RTL – and with a tiny fraction of the design time and effort. 

Other, less obvious challenges for HLS have also advanced significantly.  Early HLS focused almost completely on datapath and control optimization to match or exceed hand-coded microarchitectures.  Interfacing those auto-generated datapaths to the rest of the design, getting data into and out of those datapaths, and creating an automated method of verifying designs done with HLS were all “exercises left to the user.”  Today’s tools are much more robust – with rich feature sets for hierarchical design, interface synthesis, verification automation, memory interface management, and much more. 

The remaining challenge for C-to-FPGA HLS tools is handling the wide variety of user expertise.  While some HLS users are already happy with the ease of use, these users are most likely hardware-savvy HDL designers who use HLS as a power tool for creating better RTL more rapidly.  Because they are already intimately familiar with both the source of the HLS tool and the expected output, they are well-qualified pilots who can use HLS to get from point A to point B much more efficiently and effectively. 

On the other end of the spectrum, however, are software engineers with little or no hardware expertise, no understanding of HDL, and often massive amounts of legacy code as a starting point.  Their goal would be to identify portions of that software suitable for HLS implementation in hardware, and to use HLS to get there efficiently.  As of today, those users are still probably going to be disappointed by HLS.

HLS is currently enjoying its highest level of investment in history.  More companies are putting more resources into creating and refining HLS tools than ever before.  More users are trying and adopting HLS technology, and many already have years of experience using it in a production engineering environment.  The marriage of HLS and FPGA is one of the most promising combinations we’ve ever had to loosen the monopoly that von Neumann has on computing and to open us up to a world of vastly increased performance and efficiency.  

Leave a Reply

featured blogs
Mar 28, 2024
'Move fast and break things,' a motto coined by Mark Zuckerberg, captures the ethos of Silicon Valley where creative disruption remakes the world through the invention of new technologies. From social media to autonomous cars, to generative AI, the disruptions have reverberat...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

Shift Left with Calibre
In this episode of Chalk Talk, Amelia Dalton and David Abercrombie from Siemens investigate the details of Calibre’s shift-left strategy. They take a closer look at how the tools and techniques in this design tool suite can help reduce signoff iterations and time to tapeout while also increasing design quality.
Nov 27, 2023
16,421 views