feature article
Subscribe Now

There’s a Processor in My FPGA!

Hey, There’s LUT Fabric in my SoC!

The idea of processors and FPGAs working together is exactly as old as the idea of FPGAs. Perhaps older, in fact, because even the prehistoric pre-FPGA PLDs often showed up on CPU boards – palling up with the hot processors of the day (which boasted 8 full bits of bone-crushing capability – at speeds of over a megahertz!) Of course, those programmable devices were mostly doing “glue logic” work – connecting up things that weren’t easy to connect otherwise.

Since those early days, processors and programmable logic have enjoyed a long and romantic partnership – spending long lazy days gazing lovingly into each other’s IO ports, exchanging data (and some control signals as well), and enriching each other’s lives through mutual cooperation. The partnership was never equal, though. Processors got all the glamour and recognition. Debutante CPUs would burst onto the red carpet with wider words and faster clocks, and they’d barely give a nod to their loyal FPGA companions who worked silently in the shadows, doing all the dirty work.

Over the years, both processors and FPGAs grew, matured, and prospered, thanks to the near-endless bounty of Moore’s Law. FPGAs became famous in their own right – earning their stripes as the primary enablers of the exploding information superhighway. Even though processors and FPGAs were pursuing different primary careers, they still continued their partnership on the PCB. Nearly every device made had at least one processing element and at least one programmable logic chip. It was just the way things were done.

Then, processors fell on hard times. Pesky-old power problems caused processors to hit the wall on Moore’s Law earlier than just about any other type of chip. After years of making a living by cranking up the clocks, processors reached a speed plateau and were forced to team up in order to deliver more performance. Multi-core architectures posed problems for programmers, as software that could truly take advantage of an arbitrary number of CPUs proved extremely difficult to write. 

FPGAs have now come to the rescue of their old processing pals, however. Besides bridging, FPGAs always had the potential to do some really efficient processing themselves. Processors had always been the go-to chips for running the primary applications, of course. It seemed that programming an FPGA to crunch out a complex algorithm was about a hundred times more difficult than dropping some code into a processor to accomplish the same thing. 

Now, circumstances have conspired to change that dynamic. With the proliferation of multi-core architectures, processors have become more difficult to program. At the same time, with the advent of new design methodologies and dramatically increased density, FPGAs have become easier to program and much more efficient and capable. Thus the playing field has leveled, and FPGAs have stepped into the role of algorithm accelerators – handling extreme processing tasks much faster, and at a tiny fraction of the power that the processor would require.

This time, though, the FPGA isn’t content to hide in the shadows and let the processor take all the credit. Both Xilinx and Altera are aggressively marketing devices that combine high-performance, low-power processors with FPGA fabric – all on the same chip. Xilinx’s Zynq devices came out of the chute with much acclaim, and Altera scrambled to counter with their own SoC FPGA families. Both vendors’ devices feature ARM-based processing subsystems on the same chip with FPGA fabric and hardened peripherals, as well as a generous helping of memory. Both companies have scrambled to adapt their tool flows to accommodate the new devices and to handle the anticipated influx of engineers who are not the typical FPGA-savvy engineers that the companies are accustomed to serving.

This “new” class of devices, while not yet taking the world by storm, is certainly setting the stage for the future – a future where we may no longer distinguish between software- and hardware-programmable elements in the same chip, any more than we currently segregate elements of a processor such as registers, ALUs, and cache. For a wide range of applications, these new chips bring superior performance and flexibility while consuming a small fraction of the power required by traditional approaches. For emerging high-demand applications such as embedded vision, these heterogeneous computing chips may be key enablers.

At first glance, this might seem like a simple case of integration sleight of hand. After all, people have been perfectly content to park their favorite FPGA next to their favorite CPU for decades now. Really, isn’t putting both of them on the same chip just a typical compromise where we buy more expensive chips just to save a little board space? And, while the 2-chip solution always allowed us to pick the perfect processor and the perfect FPGA for our needs, doesn’t this new class of device potentially force us into choosing less-than-ideal versions of each, just because they come only as a package deal?

The concern about sub-optimal pairings is a reasonable one, of course. After all, there are dozens of FPGAs on the market – in a wide range of sizes, and with a huge variety of features and capabilities. In contrast, there are very few FPGA configurations currently offered as part of these new hybrid FPGA/processor chips. You do have to hope that the factory made choices that are reasonable for your needs. The same thing is true of the processor portion. While processor catalogs boast thousands of CPUs and MCUs, there are a paltry few processors available as part of FPGA SoCs. If you built a matrix of all the processor choices and all the FPGA choices, these new one-chip solutions would occupy only a tiny fraction of that matrix.

But, that matrix may not be as sparse as it seems. It turns out that there are some very reasonable choices one can make on a processor – as long as you know that a particular FPGA will be in the picture. For example, you don’t need to have dozens of different versions of the chip with different collections and configurations of peripherals. The FPGA fabric gives you the option to equip your SoC with whatever collection of peripherals you require – no more and no less. With that one capability, literally hundreds of SoC options can be replaced by one.

Similarly, because the FPGA can accelerate certain high-demand computing tasks, the performance requirement on the processor may be able to be reduced dramatically. Normally, in a system design, you have to size the processor for the absolute most demanding task your system needs to perform. Even though your system may almost never be required to use that level of performance, your CPU has to be built to handle the peak, so you’ve got a lot of extra CPU capability sitting around most of the time drinking power, taking up silicon real-estate, and just waiting for that big moment. Now, if the FPGA can handle that peak-demand computing task, the processor can be designed much more modestly. Poof! Another plethora of processor options vanishes in a puff of smoke.

The more subtle advantages of the FPGA SoC may be even more important, however. When FPGAs are used as accelerators, the connection bandwidth between FPGA, processor, and memory becomes a bottleneck. When they occupy the same chip, the number of connections increases dramatically, and the power required to drive those connections drops as well. Now, instead of having to drive signals through IO buffers, across a PCB, and onto another chip, data passed between these essential elements can flow freely through short, efficient, highly-parallel, on-chip interconnect (or, increasingly, through silicon interposer connections with similar benefits and properties). At the same time, by freeing up all the IO pins that were previously required to connect the FPGA and processor to each other, more of the package’s IO capacity can be applied to other useful tasks.

The FPGA and processor combined into a single SoC may indeed be the ultimate chip for a huge range of applications. While the wider engineering community still eyes the new devices with a well-earned wall of wary skepticism, the hardware architecture of the current generation seems more than enough to allay the fears of skeptics. These devices – from both major vendors – are remarkably well engineered, with a clever balance of resources that will work well together in a gamut of application spaces.

Apart from the perception hurdles, the only real barrier to widespread adoption and success of these amazing new devices is the tool chain. For teams accustomed to implementing their application in software running solely on traditional processors, the task of programming a heterogeneous chip like an FPGA SoC is daunting. Certainly, even though the tools are improving rapidly, the flow is far less elegant, simple, and encapsulated than the well-known compiler-based approach to software application development. When the time finally arrives that optimizing your application for an FPGA SoC is as simple as compiling and linking some C code, these chips will truly take over the world.

16 thoughts on “There’s a Processor in My FPGA!”

  1. One potential option that would address the “sub-optimal pairings” issue would be through the use of KGD and 2.5/3D technology. When you think about the tried and true two chip solution where design teams have high flexibility integrating their choice of processor and FPGA, think about the interposer as replacing the PCB in this context and KGD replacing the typical device package. In certain scenarios you might pair bare die FPGAs from somebody like Lattice with a bare die processor from ARM – integrated via an interposer.

  2. Altera’s OpenCL SDK allows software programmers to use the FPGA’s fabric as easily as they use the processor cores. The same code that runs on an SoC also runs on the largest parts. Altera is also committed to having an SoC in each family so the Stratix 10 will have quad-core 64-bit ARM cores to match the GHz fabric and hardened floating point. My rule of thumb is go make a list of what you think FPGAs don’t do and you’ll find in a few years they’ll support those features. There is no stopping FPGAs from taking over the world!

  3. Indeed, Mentor Graphics had the ASAP (Application Specific Assistant Processor)tool 10 years ago. It automated the ability for the user to select a function and push it onto the FPGA fabric. It used an HLS tool of course for the synthesis. Though OpenCL gives the user has control over buffering and task control.

  4. Pingback: loker januari 2017
  5. Pingback: binaural
  6. Pingback: day trading
  7. Pingback: friv
  8. Pingback: Bdsm
  9. Pingback: check it out
  10. Pingback: bandar taruhan
  11. Pingback: Stix Events
  12. Pingback: satta matka

Leave a Reply

featured blogs
Jul 20, 2024
If you are looking for great technology-related reads, here are some offerings that I cannot recommend highly enough....

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

Sponsored by Cadence Design Systems

Larsen & Toubro built the world’s largest FIFA stadium in Qatar, the world’s tallest statue, and one of the world’s most sophisticated cricket stadiums. Their latest business venture? Designing data centers. Since IT equipment in data centers generates a lot of heat, it’s important to have an efficient and effective cooling system. Learn why, Larsen & Toubro use Cadence Reality DC Design Software for simulation and analysis of the cooling system.

Click here for more information about Cadence Multiphysics System Analysis

featured paper

Navigating design challenges: block/chip design-stage verification

Sponsored by Siemens Digital Industries Software

Explore the future of IC design with the Calibre Shift left initiative. In this paper, author David Abercrombie reveals how Siemens is changing the game for block/chip design-stage verification by moving Calibre verification and reliability analysis solutions further left in the design flow, including directly inside your P&R tool cockpit. Discover how you can reduce traditional long-loop verification iterations, saving time, improving accuracy, and dramatically boosting productivity.

Click here to read more

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023