The idea of processors and FPGAs working together is exactly as old as the idea of FPGAs. Perhaps older, in fact, because even the prehistoric pre-FPGA PLDs often showed up on CPU boards – palling up with the hot processors of the day (which boasted 8 full bits of bone-crushing capability – at speeds of over a megahertz!) Of course, those programmable devices were mostly doing “glue logic” work – connecting up things that weren’t easy to connect otherwise.
Since those early days, processors and programmable logic have enjoyed a long and romantic partnership – spending long lazy days gazing lovingly into each other’s IO ports, exchanging data (and some control signals as well), and enriching each other’s lives through mutual cooperation. The partnership was never equal, though. Processors got all the glamour and recognition. Debutante CPUs would burst onto the red carpet with wider words and faster clocks, and they’d barely give a nod to their loyal FPGA companions who worked silently in the shadows, doing all the dirty work.
Over the years, both processors and FPGAs grew, matured, and prospered, thanks to the near-endless bounty of Moore’s Law. FPGAs became famous in their own right – earning their stripes as the primary enablers of the exploding information superhighway. Even though processors and FPGAs were pursuing different primary careers, they still continued their partnership on the PCB. Nearly every device made had at least one processing element and at least one programmable logic chip. It was just the way things were done.
Then, processors fell on hard times. Pesky-old power problems caused processors to hit the wall on Moore’s Law earlier than just about any other type of chip. After years of making a living by cranking up the clocks, processors reached a speed plateau and were forced to team up in order to deliver more performance. Multi-core architectures posed problems for programmers, as software that could truly take advantage of an arbitrary number of CPUs proved extremely difficult to write.
FPGAs have now come to the rescue of their old processing pals, however. Besides bridging, FPGAs always had the potential to do some really efficient processing themselves. Processors had always been the go-to chips for running the primary applications, of course. It seemed that programming an FPGA to crunch out a complex algorithm was about a hundred times more difficult than dropping some code into a processor to accomplish the same thing.
Now, circumstances have conspired to change that dynamic. With the proliferation of multi-core architectures, processors have become more difficult to program. At the same time, with the advent of new design methodologies and dramatically increased density, FPGAs have become easier to program and much more efficient and capable. Thus the playing field has leveled, and FPGAs have stepped into the role of algorithm accelerators – handling extreme processing tasks much faster, and at a tiny fraction of the power that the processor would require.
This time, though, the FPGA isn’t content to hide in the shadows and let the processor take all the credit. Both Xilinx and Altera are aggressively marketing devices that combine high-performance, low-power processors with FPGA fabric – all on the same chip. Xilinx’s Zynq devices came out of the chute with much acclaim, and Altera scrambled to counter with their own SoC FPGA families. Both vendors’ devices feature ARM-based processing subsystems on the same chip with FPGA fabric and hardened peripherals, as well as a generous helping of memory. Both companies have scrambled to adapt their tool flows to accommodate the new devices and to handle the anticipated influx of engineers who are not the typical FPGA-savvy engineers that the companies are accustomed to serving.
This “new” class of devices, while not yet taking the world by storm, is certainly setting the stage for the future – a future where we may no longer distinguish between software- and hardware-programmable elements in the same chip, any more than we currently segregate elements of a processor such as registers, ALUs, and cache. For a wide range of applications, these new chips bring superior performance and flexibility while consuming a small fraction of the power required by traditional approaches. For emerging high-demand applications such as embedded vision, these heterogeneous computing chips may be key enablers.
At first glance, this might seem like a simple case of integration sleight of hand. After all, people have been perfectly content to park their favorite FPGA next to their favorite CPU for decades now. Really, isn’t putting both of them on the same chip just a typical compromise where we buy more expensive chips just to save a little board space? And, while the 2-chip solution always allowed us to pick the perfect processor and the perfect FPGA for our needs, doesn’t this new class of device potentially force us into choosing less-than-ideal versions of each, just because they come only as a package deal?
The concern about sub-optimal pairings is a reasonable one, of course. After all, there are dozens of FPGAs on the market – in a wide range of sizes, and with a huge variety of features and capabilities. In contrast, there are very few FPGA configurations currently offered as part of these new hybrid FPGA/processor chips. You do have to hope that the factory made choices that are reasonable for your needs. The same thing is true of the processor portion. While processor catalogs boast thousands of CPUs and MCUs, there are a paltry few processors available as part of FPGA SoCs. If you built a matrix of all the processor choices and all the FPGA choices, these new one-chip solutions would occupy only a tiny fraction of that matrix.
But, that matrix may not be as sparse as it seems. It turns out that there are some very reasonable choices one can make on a processor – as long as you know that a particular FPGA will be in the picture. For example, you don’t need to have dozens of different versions of the chip with different collections and configurations of peripherals. The FPGA fabric gives you the option to equip your SoC with whatever collection of peripherals you require – no more and no less. With that one capability, literally hundreds of SoC options can be replaced by one.
Similarly, because the FPGA can accelerate certain high-demand computing tasks, the performance requirement on the processor may be able to be reduced dramatically. Normally, in a system design, you have to size the processor for the absolute most demanding task your system needs to perform. Even though your system may almost never be required to use that level of performance, your CPU has to be built to handle the peak, so you’ve got a lot of extra CPU capability sitting around most of the time drinking power, taking up silicon real-estate, and just waiting for that big moment. Now, if the FPGA can handle that peak-demand computing task, the processor can be designed much more modestly. Poof! Another plethora of processor options vanishes in a puff of smoke.
The more subtle advantages of the FPGA SoC may be even more important, however. When FPGAs are used as accelerators, the connection bandwidth between FPGA, processor, and memory becomes a bottleneck. When they occupy the same chip, the number of connections increases dramatically, and the power required to drive those connections drops as well. Now, instead of having to drive signals through IO buffers, across a PCB, and onto another chip, data passed between these essential elements can flow freely through short, efficient, highly-parallel, on-chip interconnect (or, increasingly, through silicon interposer connections with similar benefits and properties). At the same time, by freeing up all the IO pins that were previously required to connect the FPGA and processor to each other, more of the package’s IO capacity can be applied to other useful tasks.
The FPGA and processor combined into a single SoC may indeed be the ultimate chip for a huge range of applications. While the wider engineering community still eyes the new devices with a well-earned wall of wary skepticism, the hardware architecture of the current generation seems more than enough to allay the fears of skeptics. These devices – from both major vendors – are remarkably well engineered, with a clever balance of resources that will work well together in a gamut of application spaces.
Apart from the perception hurdles, the only real barrier to widespread adoption and success of these amazing new devices is the tool chain. For teams accustomed to implementing their application in software running solely on traditional processors, the task of programming a heterogeneous chip like an FPGA SoC is daunting. Certainly, even though the tools are improving rapidly, the flow is far less elegant, simple, and encapsulated than the well-known compiler-based approach to software application development. When the time finally arrives that optimizing your application for an FPGA SoC is as simple as compiling and linking some C code, these chips will truly take over the world.