Is AI the Killer FPGA Application?

Ross Freeman, co-founder of Xilinx, invented the FPGA in 1984. In the 34 years that have passed, FPGAs have been wildly successful and are certainly among the most important electronic devices ever conceived. But during that entire history, tracing the evolution of FPGAs from dozens of LUTs to millions, the FPGA has been the optimal solution for … exactly zero applications.

Don’t get me wrong. FPGAs do one thing exceptionally well: Flexibility. FPGAs can often do what no other device can, bridging gaps between otherwise-incompatible protocols, reconfiguring themselves on the fly to adapt to changing requirements and circumstances, and acting as stand-ins for ASICs and ASSPs that have not yet been created. If your application needs the flexibility of FPGAs, probably nothing else will work.

But all that flexibility comes at a cost – price, power, and performance. Best estimates are that all three of those factors are worse than optimized, dedicated silicon by about a factor of ten. That means that, if you design your application with an FPGA in it, and your application is successful, over time, once your requirements stop changing and your design and architecture get nailed down, replacing that FPGA with something cheaper, faster, and more power-efficient will be high on your list of priorities.

For many applications, that day never comes. By the time there is impetus to remove the FPGA, new requirements have come along that start the clock over again. A new design requires a new FPGA and goes through its own maturation process. So, FPGAs have had many application areas where they remain for decades, even though they are never ever the optimal design solution.

This is a problem for FPGA companies, as it limits their growth potential. Their products never get to enjoy the “cash cow” stage, where large volume orders come in with virtually no effort. Instead, FPGA vendors are constantly battling to win new sockets and to re-win old ones. They are forever giving heavy-duty support to a wide range of customers in ever-evolving situations simply in order to retain business they’ve already won. You may have been supplying a customer like Cisco with fantastic silicon and service for decades, but fall off your game on one new design iteration and you’ll find yourself kicked to the curb while your competitor steps in and captures your hard-earned customer.

As a result, FPGA companies have always been chasing the elusive “killer app” – the application where FPGAs are the optimal fit, and there’s no opportunity for some ASIC or ASSP to step in and grab the easy money just as the market explodes. The requirements are tricky. You need to find a problem where the flexibility of FPGAs is an essential and enduring part of the solution, and where dedicated/specialized hardware can’t be designed to do the job any better or cheaper. That’s a tall order, but it’s never stopped them from trying.

Now, there is considerable buzz in the industry about FPGAs for AI applications. Both Xilinx and Intel are touting the prowess of FPGAs as accelerators for convolutional neural network (CNN) deep-learning applications. These AI applications typically have both “training” and “inference” phases. “Training” is executed on a broad data set to teach the network its job and establish the best topology, coefficients, and so forth. Training typically happens once, and big-iron servers are used – often with GPUs doing the heavy lifting. Training requires massive floating-point computation, and GPUs are the best fit (so far) for delivering the required floating-point performance.

Once the network is trained and optimized, it can be deployed into the field, where “inferencing” is done. Inferencing has different requirements than training. Where training is generally done in a data center, inferencing is often done in embedded applications. Where training is not particularly sensitive to cost and power (since it’s done once, in a data center), inferencing can be extremely sensitive to both cost and power, since it will be done enormous numbers of times, will often be built into high-volume embedded systems with severe BOM cost restrictions, and will often be required to operate on batteries or other very limited power sources.

Most importantly, unlike training, inferencing can be done with narrow bit-width fixed point computation. Helloooooo FPGAs! Your killer app may have just arrived.

It turns out you can build a pretty decent neural net inferencing engine using FPGA LUT fabric. The fixed-point math aligns perfectly with FPGAs’ sweet spot, and best of all (try to contain your glee, FPGA marketers) every neural network topology is different. It turns out that optimal neural networks for various applications have very different topologies, so the inferencing processor for automotive vision, for example, might be completely different from one for language processing.

And, speaking of automotive vision applications, here we have an enormous industry going through a period of disruptive change, where FPGAs might possibly play an essential and irreplaceable role? Oh my. FPGA sales people are skipping along the garden path with songs in their hearts and dollar signs in their eyes. Could this finally be it? Is AI the promised land where programmable logic can curl up on a rug in front of the fire and just cash checks all day long?

Sadly, probably not.

First, it’s no secret that FPGAs are hard to “program.” There is a very small population of FPGA experts in the world, and the Venn diagram showing those people and AI application experts has a very small intersection indeed. That means that AI experts need some serious help creating FPGA implementations of their solutions. Both Xilinx and Intel have gone to great lengths to bridge that expertise gap, with various solutions in play. The most common answer for those non-FPGA-experts wanting to use FPGAs as accelerators is to use OpenCL (a C-like language designed for using GPUs as accelerators). It’s a solid strategy. If you want to sell FPGAs against GPUs, design a flow that makes GPU code easily portable to FPGAs. That way, you’ve conquered the programming challenge right up front – or at least made it much easier.

Unfortunately, the GPU-as-accelerator market is dominated by one company – Nvidia. Nvidia created a proprietary language called CUDA (similar to OpenCL) for software developers wanting to use Nvidia GPUs to accelerate their applications. When OpenCL came along (with that annoying “Open” in the name) Nvidia announced “support” for OpenCL, but clearly kept the bulk of their effort behind CUDA, and their CUDA customers were perfectly happy to keep writing CUDA code, thank you very much. The success of CUDA and Nvidia in the GPU acceleration market has put a big damper on the adoption of OpenCL, and that has significantly slowed the use of OpenCL as a bridge from GPU-based acceleration to FPGA-based acceleration – which is exactly what Nvidia wants.

Further, many of the neural network experts have not yet taken the OpenCL or the CUDA plunge. They need more help to bridge the gap between their trained models and FPGA-based inferencing engines. A number of companies are attacking this very important problem, and we’ll discuss them more in the future. But for now, FPGA-based neural network inferencing is basically limited to organizations with the ability to deploy FPGA experts alongside their neural network/AI engineers. In fact, this problem is most likely the driving factor behind the Xilinx/Daimler alliance we wrote about last week – Daimler probably needed Xilinx’s help to implement their automotive-specific AI algorithms on Xilinx’s hardware.

Beyond the programming problem, there is another barrier in the FPGAs’ path to irreplaceability. In high-volume embedded applications (such as automobiles) the solutions will become both specific and bounded over time. That means that the wild flexibility of FPGAs will no longer be required. If a pedestrian-recognizing network is proven effective, the hardware for that application can still be hardened, with a potentially enormous boost in performance, reduction in power, and (most importantly) reduction in cost. The FPGA will only be the go-to chip while the system architecture is in flux.

We talked with a manufacturer of Lidar, for example, who uses large numbers of FPGAs in their system. While they are producing one of the very best performing systems on the market for automotive applications, their system cost still runs five digits. They estimate that they need to reduce that to three digits (Yep, a two-order-of-magnitude cost reduction for the same performance) as well as reducing their footprint and power consumption – before they are viable for mass production in automobiles. The top of their to-do list? Design out the FPGAs.

We suspect that this same situation may exist across numerous subsystems going after the ADAS and AD markets, leading to temporary euphoria for FPGA companies as they win sockets – followed by future disappointment when they are designed out before the big-volume payoff. And, this is just in the automotive space. Anywhere FPGAs are called upon to do the same job for a long period of time, they are vulnerable to replacement by a more optimized application-specific solution.

One place this may not be true (or may at least see the effects delayed) is in the data center/cloud deployment of AI applications. There, a variety of different-topology neural networks may have to be deployed on the same hardware, and further-optimized ASIC or ASSP solutions will be much longer in arriving. But even then, we will need more purpose-built FPGAs aimed directly at the data center. The current “do anything” SoC FPGA with its assortment of “would be nice” features will certainly be too bloated with unused features to be optimal for data center applications. And, given the recent rise in eFPGA (embedded FPGA IP) technology, more companies may choose to design data center class neural network accelerators that don’t have the overhead (and huge margins) of stand-alone FPGAs.

With both Xilinx and Intel focused on fighting it out for the data center acceleration market (which is expected to be exploding any time now), this aspect of their respective strategies will be critical. But with their focus on the data center, the possibly more lucrative embedded opportunities such as automotive should prove even more elusive. It will be interesting to watch.