We are always trying to make machines that think faster. Before we finish building computers that can solve the last generation of problems, our imagination expands and we have a whole new set of challenges that require a new level of computing power. Emerging applications like machine vision can seemingly consume all of the computing power we could possibly throw at them – and then some.
For the past couple of decades, a quiet but radical minority has seen FPGAs as a magic bullet in the quest for more computing power. However, the challenges of programming FPGAs for software-like tasks was daunting, and the inertial progress of von Neumann machines surfing the seemingly-eternal wave of Moore’s Law was sufficient to keep our appetites sated.
However, the monolithic von Neumann machine ran out of steam a few years ago. Instead of making larger, faster processors, we had to start building arrays of processors to work in parallel. Once we crossed that line into multiprocessor land, our happy simple time of easy-to-program hardware vanished forever. Programmers now have to understand at least some of the characteristics of the underlying computing machinery in order to write software to efficiently take advantage of many processors working in parallel.
So, software engineers – already doing some of the most complex engineering work in human history – had their job made even more unmanageable by the introduction of multiple parallel processing elements. The complexity of writing “conventional” software is gradually approaching parity with the complexity of developing custom parallel computing hardware from scratch.
So, entering the ring from the left – the newly hardware-savvy software engineer looking to take advantage of the fine-grained parallelism of FPGAs for a giant leap in computational efficiency.
Hardware engineers entered the arena from a different perspective. The rampage of Moore’s law has kept hardware engineers perpetually working to raise their level of design abstraction. When gate-by-gate design using schematics got too complicated to manage, we went to hardware description languages (HDLs) and logic synthesis. When our HDL flow fell short of the productivity we required, we brought in blocks of pre-designed IP. When plugging IP blocks together didn’t give us the flexibility we needed, we entertained the idea of high-level synthesis (HLS).
Now we have reached a point where both hardware and software engineers can take advantage of the performance and power efficiency of FPGAs. Because of high-level synthesis, hardware engineers now have a design tool at a sufficiently high level of abstraction that they can implement many (admittedly not the most complex) algorithms directly in hardware with an incredible degree of performance and power efficiency. The complexity limitations are very real, however, and therefore the work of hardware engineers is often to isolate only the most severe computing bottlenecks in the overall algorithm, and then to implement those as something like hardware accelerators, with conventional processors handling the less demanding parts of the problem.
Typically, this means some deeply nested looping structure within the algorithm is converted into a pipelined datapath/control/memory subsystem that takes advantage of the fine-grained parallelism of the FPGA to do the heavy lifting, while some kind of conventional processor navigates the complex control structures of the larger application.
The hardware/software divide viewed from the software engineering perspective is very similar. Software engineers also isolate portions of the algorithm where the conventional processor is heavily loaded and move those into FPGAs for faster processing.
Since each group is approaching the hardware/software tradeoff space of heterogeneous computing from a different direction, different methodologies have emerged catering to each. The software engineer is courted by flows like OpenCL, where the message is “Come on in, the water’s fine. We can take that OpenCL code you wrote for your GPU and magically “compile” it for FPGAs. You don’t need to go back to hardware school to use it. Just bring in your existing software and you’re good to go. (Of course, this is a very rosy version of the scenario. One might even characterize it as “marketing.”)
Hardware engineers are entranced by the siren song of HLS. At first glance, it looks like a power tool for language-based design. Of course the “language” is some dialect of C (ANSI C/C++, SystemC, etc.) but the rest of the process is in the vernacular of hardware design. You are constructing datapaths and controllers, managing memory access schemes, constructing pipelines, managing the chain of register-to-register combinational logic – pretty much the same things you’d do writing RTL code, but with greatly increased speed and aplomb. You can explore twenty microarchitectures in minutes, where each one would take days to code in RTL. HLS, in the marketing sense, is truly a superpower for hardware engineers.
Today, however, neither the software- nor the hardware-centric design flow can deliver on the rosy vision painted by the marketers. OpenCL code optimized for GPUs is most certainly not optimal for FPGAs, and you’ll need to do some monkeying around to get the results you want. HLS can’t handle just any old C code, and you hardware engineers will need to train yourselves to the somewhat-limited capabilities of the tools.
Both of these flows exist today, however. People have used both approaches to successfully amp up the processing speed and decrease the power consumption of everything from radar algorithms to financial analysis to gene sequencing. With hardware and software engineers staring almost nose-to-nose at each other across the conceptual divide, we are on the verge of a convergence of epic proportions – with FPGAs at the center.
Recently, we talked to Allan Cantle (founder of Nallatech) about the state of affairs in FPGA-based computing. Cantle’s team were pioneers in FPGA-based supercomputing more than a decade ago, and both their hardware and software have continued to be among the most effective at putting FPGAs to work in challenging computing problems. Cantle pointed out that, thanks to current language and tool technologies like OpenCL and HLS, we are now arriving at a state where teams can access the benefits of FPGA-based acceleration without having resident hardware experts.
One problem that needed to be addressed, however, was form factor. Software-centric teams are not accustomed to buying chips. In fact, most software-centric teams are uncomfortable even with the idea of boards. While hardware engineers live every day in the land of silicon and FR4, software engineers like their computers to come in nice sturdy boxes – preferably with a built-in power supply. Nallatech is currently attacking that problem as well – building FPGA-based accelerator cards as well as full rack-mounted systems (with partners like HP and IBM) that take advantage of Altera’s FPGAs and OpenCL flow.
These types of platforms have FPGAs knocking on the door of the data center – a land of riches previously off limits to our LUT-laden lads. With the extreme challenges faced by data centers today in terms of computational power efficiency, there is the potential for an explosive invasion of FPGA technology. It all hinges on the success of current efforts at taming the programming model problem for these heterogeneous computing systems, and getting those hardware and software disciplines to meet nicely in the middle.