feature article
Subscribe Now


Who’ll Use the Next Generation of Design Tools?

The von Neumann architecture is a miracle of efficiency if you count the algorithmic complexity that can be completed by any given number of transistors.  If you’ve got enough transistors to create a 32-bit processor plus peripherals plus enough memory to store a decent size program, you can execute an enormously complicated algorithm. 

Where von Neumann isn’t so efficient is in the amount of computation for a given amount of power, or in the number of computations in a given amount of time.  Those battles are won handily by custom, parallel hardware like we might create in an FPGA, or in a custom, algorithm-specific block in an ASIC or custom SoC.  Optimized hardware that specifically implements our algorithm will always win in terms of speed and power – at a cost of vastly increased transistor count.

Throw these two abstract realities on the backdrop of Moore’s Law, and you can see what happens.  Every couple of years, the cost of transistors drops approximately in half.  We can get double the transistor count on the same size piece of silicon, so the size and complexity of algorithm that could be implemented in parallel hardware doubles.  A few years ago, if we had a complex operation represented by a bunch of software, we could afford to take only a small, critical function or two out of that operation and implement them in hardware.  With each passing process node, however, the number of transistors available for hardware implementation doubles, and so does the amount and complexity of what would have been software but can now be hardware. 

Of course, gaining the benefits of moving software into hardware costs something more than just a few orders of magnitude more transistors. It costs design time and effort.  Overall, when we put custom hardware implementation in a balance scale, on the plus side of the scale, we have enormous gains in performance and power efficiency.  On the minus side, we have orders of magnitude more transistors/cost, significantly higher design effort, and less system flexibility. 

As we mentioned – Moore’s Law is constantly making the first item on the negative side better.  To address the last two items, we have high-level synthesis (HLS) plus FPGA fabric.  The myth and the goal of HLS is that we can take our software algorithm, run it though our magic high-level synthesis tool, and out pops an optimized, parallelized, super-efficient hardware implementation of that algorithm that we can plop down in an FPGA.  That magic C-to-hardware transformation is what HLS has been promising for more than two decades.  

If you ask a panel of experts (which I have done on several occasions,) you will find opinions ranging from “We can do it today!” to “It will never happen.”  Why the range of answers?  On the plus side of the scale (our scale is getting a workout today, isn’t it?) there are several tools in production use today that can take untimed algorithms written in carefully constructed C or C++ and turn them almost magically into high-quality synthesizable RTL.  We have written about this many times before, of course, and we’ve even written about BDTi’s benchmarking and certification program where they set about proving it. 

Those on the “it will never happen” side of the scale, however, are quick to point out that this is not the mythical beast of software transformed magically into hardware by some omnipotent compiler.  These tools require significant hardware expertise on the part of the user.  One must understand concepts like pipelining, loop unrolling, latency, throughput, fixed-point math, quantization, resource sharing, and other hardware-centric concepts in order to write the code, control the tools, and understand the results.  

The “we can do it today” crowd seems to get more nearly correct with each passing year.  Every year, we see new tools on the market, significantly more design experience with the old tools, and improved results reported by those using HLS in production.  The tools also seem subjectively to be less sensitive to coding style in the original C/C++ – they now support various dialects from custom languages that use C-like syntax to ANSI C/C++ to SystemC. 

The “It will never happen” folks also make a compelling point, however.  If we are expecting C-to-FPGA to ever behave like a software compiler, we’re overlooking an important fact about the difference between hardware and software.  For a software compiler, there is always something that could be agreed upon as a “best” solution.  Compiler developers can tune away – trying to minimize the size and maximize the speed of the generated code.  The right answer is reasonably easy to quantify.  Optimization choices made during software compilation have at best a modest effect on the results.  For the sake of argument, maybe zero to 20% plus or minus. 

In hardware architecture, however, there is a gigantic range of answers.  The fastest solution might take 1000x the amount of hardware to implement as the densest one.  The lowest power version might run at a tiny fraction of the maximum speed.  The size of the design space one can explore in HLS is enormous.  Implementing a simple datapath algorithm in an FPGA, for example, one might choose to use a single hardware multiplier/DSP block for maximum area efficiency – or one might have the datapath use every single available DSP block on the chip – which can now range into the thousands.  The cost/performance tradeoff available to the user, then, could be in the range of three orders of magnitude.  The “best” answer depends on the user’s knowledge of the true design goals, and how those goals map down to the particular piece of hardware being implemented with HLS.  Unless the user has a way to express those design goals and constraints and percolate those down into the detailed levels of the design hierarchy, an HLS tool has almost zero chance of guessing the right answer.  It is NOT like a software compiler.

For years, the challenge users threw down to HLS providers was “results must be as good as hand-coded RTL.”  This is a worthy goal, and reminiscent of what the hand-assembly crowd expected of the software compilers trying to woo them into high-level languages.  However, many HLS tools have now achieved and surpassed that goal.  In numerous production reports, HLS tools have delivered results equal or superior to hand-coded RTL – and with a tiny fraction of the design time and effort. 

Other, less obvious challenges for HLS have also advanced significantly.  Early HLS focused almost completely on datapath and control optimization to match or exceed hand-coded microarchitectures.  Interfacing those auto-generated datapaths to the rest of the design, getting data into and out of those datapaths, and creating an automated method of verifying designs done with HLS were all “exercises left to the user.”  Today’s tools are much more robust – with rich feature sets for hierarchical design, interface synthesis, verification automation, memory interface management, and much more. 

The remaining challenge for C-to-FPGA HLS tools is handling the wide variety of user expertise.  While some HLS users are already happy with the ease of use, these users are most likely hardware-savvy HDL designers who use HLS as a power tool for creating better RTL more rapidly.  Because they are already intimately familiar with both the source of the HLS tool and the expected output, they are well-qualified pilots who can use HLS to get from point A to point B much more efficiently and effectively. 

On the other end of the spectrum, however, are software engineers with little or no hardware expertise, no understanding of HDL, and often massive amounts of legacy code as a starting point.  Their goal would be to identify portions of that software suitable for HLS implementation in hardware, and to use HLS to get there efficiently.  As of today, those users are still probably going to be disappointed by HLS.

HLS is currently enjoying its highest level of investment in history.  More companies are putting more resources into creating and refining HLS tools than ever before.  More users are trying and adopting HLS technology, and many already have years of experience using it in a production engineering environment.  The marriage of HLS and FPGA is one of the most promising combinations we’ve ever had to loosen the monopoly that von Neumann has on computing and to open us up to a world of vastly increased performance and efficiency.  

Leave a Reply

featured blogs
Sep 21, 2023
Wireless communication in workplace wearables protects and boosts the occupational safety and productivity of industrial workers and front-line teams....
Sep 21, 2023
Labforge is a Waterloo, Ontario-based company that designs, builds, and manufactures smart cameras used in industrial automation and defense applications. By bringing artificial intelligence (AI) into their vision systems with Cadence , they can automate tasks that are diffic...
Sep 21, 2023
At Qualcomm AI Research, we are working on applications of generative modelling to embodied AI and robotics, in order to enable more capabilities in robotics....
Sep 21, 2023
Not knowing all the stuff I don't know didn't come easy. I've had to read a lot of books to get where I am....
Sep 21, 2023
See how we're accelerating the multi-die system chip design flow with partner Samsung Foundry, making it easier to meet PPA and time-to-market goals.The post Samsung Foundry and Synopsys Accelerate Multi-Die System Design appeared first on Chip Design....

featured video

TDK PowerHap Piezo Actuators for Ideal Haptic Feedback

Sponsored by TDK

The PowerHap product line features high acceleration and large forces in a very compact design, coupled with a short response time. TDK’s piezo actuators also offers good sensing functionality by using the inverse piezo effect. Typical applications for the include automotive displays, smartphones and tablet.

Click here for more information about PowerHap Piezo Actuators

featured paper

Accelerating Monte Carlo Simulations for Faster Statistical Variation Analysis, Debugging, and Signoff of Circuit Functionality

Sponsored by Cadence Design Systems

Predicting the probability of failed ICs has become difficult with aggressive process scaling and large-volume manufacturing. Learn how key EDA simulator technologies and methodologies enable fast (minimum number of simulations) and accurate high-sigma analysis.

Click to read more

featured chalk talk

Spectral and Color Sensors
Sponsored by Mouser Electronics and ams OSRAM
There has been quite a bit of advancement in the world of spectrometers of the last several years. In this episode of Chalk Talk, Amelia Dalton and Jim Archibald from ams OSRAM investigate how multispectral sensing solutions are driving innovation in a variety of different fields. They also explore the functions involved with multispectral sensing, the details of ams OSRAM’s AS7343 spectral sensor, and why smoke detection is a great application for this kind of multispectral sensing.
Mar 6, 2023