feature article
Subscribe Now

Flex Logix configurable hardware IP for AI and DSP workloads fuses FPGAs, tensor units, and software

Today, we’re going to talk about AI, DSP, FPGAs, IP, and SoCs. Normally, these things don’t all go together. Certainly, FPGAs have been used to implement AI and DSP algorithms, although AI and DSP algorithms generally involve different sorts of computations. (See “A Brief History of the Single-Chip DSP, Part II .”) DSP designs have largely stayed with FPGA implementations, thanks to the bounty of multiplier/accumulators (MACs) they deliver, while AI training has migrated to GPUs. These days, AI inference has been moving to multiple hardware implementations including augmented CPUs, dedicated chips, purpose-built AI engines for SoCs, and even FPGAs. Now, Flex Logix has introduced yet another alternative: configurable hardware IP that’s specifically designed to execute AI and DSP workloads.

Founded in 2014, Flex Logix is no newcomer to configurable hardware IP. The company has offered EFLEX embedded FPGA (eFPGA) IP tiles and software for nearly a decade. According to Geoff Tate, Flex Logix co-founder and CEO, the company now has several defense-related customers and more than half a dozen commercial customers with 23 different chips in high-volume production that incorporate EFLEX eFPGAs. Examples of such devices range from a high-end 5G MIMO SoC from an unnamed customer that integrates a 100K-LUT (lookup table) eFPGA along with multiple transmit and receive digital front ends for the MIMO antenna interface to a new line of small and inexpensive Renesas ForgeFPGAs with 1K to 4K LUTs that can cost less than 50 cents in volume.

 

Figure 1: Small, inexpensive Renesas ForgeFPGAs are based on Flex Logix FPGA IP and the development tools are based on the tools supplied by Flex Logix. Image credit: Flex Logix

Renesas ForgeFPGAs trace their lineage to Silego devices, which my friend and colleague Max Maxfield has written about previously. (See “Not Much Happened, or Did It?” and “Renesas Announces Fabulous ForgeFPGA Family.”) Dialog Semiconductor bought Silego in 2017 and was in turn purchased by Renesas in 2021, which is how Renesas became an FPGA vendor. Little fish get eaten by bigger fish, which in turn get eaten by even bigger fish.

The Renesas FPGAs provide a version of the Flex Logix FPGA software directly to developers for configuring ForceFlex devices, which indicates that Flex Logix can provide usable software directly to system developers. That’s an important capability when entering the AI market, where you want to allow data scientists and programmers with no hardware development background to use your products easily.

In many ways, these new Flex Logix InferX hardware tiles trace their lineage back to the company’s eFPGA tiles. InferX IP tiles are hardened IP, supplied by Flex Logix to a customer for a specific IC manufacturing node at a specific foundry. In this way, InferX IP is just like the company’s EFLEX FPGA IP. In the case of the InferX IP, Flex Logix is initially targeting TSMC’s N5 5nm process node, but the company says that any FinFET process is a candidate for this IP.

Further, AI and DSP algorithms execute a large number of multiply/accumulate operations. FPGAs, including the Flex Logix eFPGA tiles, generally provide many, many multiplier/accumulators to execute all of those operations in real time, but these MACs are embedded in the FPGA fabric like the raisins in my grandmother’s delicious Friday-night rice pudding recipe, which were relatively few and far between. In an FPGA, hardened MACs are surrounded by the FPGA’s interconnect fabric to enable the creation of any sort of execution pipeline. However, AI and DSP algorithms don’t need that sort of flexibility—their calculation pipelines are far more regular—so the silicon overhead of an FPGA’s routing structures are superfluous and can be removed. That silicon overhead is significant and can be as much as 80% of the silicon in the area in and around the MACs according to Tate.

Instead, Flex Logix took its MAC design, enhanced it for AI and DSP operations, and created a hardwired tensor unit that interconnects a large number of MACs with a hardened interconnect, just the way an SoC would be designed. The resulting InferX hardware IP tile combines four of these tensor units with individual L2 SRAM caches and a smallish eFPGA that’s used to configure and interconnect the tensor units, as shown in Figure 2 below.

 

Figure 2: The Flex Logix InferX hardware IP tile combines four of these tensor units with a small eFPGA used to configure and interconnect the tensor units. Image Credit: Flex Logix

According to Flex Logix, the computational capabilities of these InferX tiles scale more or less linearly, so that the SoC development team can incorporate one tile, four tiles, eight tiles, and possibly more to meet system processing requirements. Figure 3 below illustrates the performance capabilities of 1-, 2-, 4-, and 8-tile InferX instantiations for some commonly used AI models.

 

Figure 3: SoC developers can instantiate multiple InferX tiles to achieve the performance level required by their application. (Note: IPS = Inferences Per Second.) Image credit: Flex Logix

As Figure 3 shows, a single InferX tile can implement multiple AI image-recognition models with varying degrees of image resolution. A 2-tile InferX instantiation roughly doubles performance compared to a 1-tile implementation, and 4- and 8-tile instantiations further boost performance. As can be seen from Figure 3, sometimes the performance boost is linear, sometimes it’s superlinear; and sometimes it’s sublinear. The performance gained by increasing the number of InterX tiles depends on memory capacity (one or two LPDDR5 SDRAMs), I/O saturation, and other system-level bottlenecks.

The required subsystem performance varies by end application, of course. A race car traveling at 200 MPH has far more demanding image-processing performance requirements in terms of inferences per second (IPS) than does a delivery robot moving at a few miles per hour. Note that the performance specifications shown in Figure 3 are for images processed per second (IPS) rather than the TOPS (trillions of operations per second) numbers other vendors use as a proxy for actual AI performance. That’s because InferX tiles are designed and configured for dedicated, real-time tasks in embedded equipment, so they’re scaled and programmed/configured accordingly. Tate says that if the prospective customer starts asking about TOPS, then the salesperson is probably talking to the wrong prospect. According to Tate, it’s the performance for specific AI model(s) that’s important for SoCs that are candidates for InferX tiles.

The software that Flex Logix provides to InferX customers does not expose the FPGA configuration or tensor engine programming directly to developers. Instead, the developer starts with an AI model trained within a machine learning framework such as Caffe, TensorFlow, PyTorch, or mxnet. These trained AI models all employ floating-point arithmetic, which is not needed for inference, so the first step in converting the AI model for use with the InferX tile is a model quantizer, which converts the floating-point values into 8-bit integers. After that, a graph compiler and operator compiler map the model onto the InferX architecture to emit an InferX run-time configuration, which the company calls “Softlogic,” and driver software with API calls. Because the InferX hardware IP is configurable, AI models can be swapped out for new ones in microseconds, in a manner similar to the use of software overlays for microprocessors. Taken together, the company collectively calls the InferX tile and AI software “InferX AI.”

For now, AI applications may get the lion’s share of press and hype, but Flex Logix has found that the primary use for its InferX tiles at the moment is for DSP algorithms, which also need to execute a lot of MACs. So, the company has created a similar package of InferX tiles and DSP software, conveniently called “InferX DSP.” According to Tate, the primary market for InferX DSP is for the replacement of Xilinx devices, specifically Xilinx Versal FPGAs and Zynq SoCs and MPSoCs, which combine Arm CPUs with Xilinx FPGA fabric. If a company is already going to the effort and expense of designing an SoC to replace these off-the-shelf Xilinx devices, there’s a potential cost savings to be had by incorporating one or more InferX tiles into the SoC to perform the MACs.

Flex Logix provides high-level software that implements popular DSP algorithms such as FFTs, FIR filters, and complex matrix inversions. For both InferX AI and InferX DSP, the resulting IP block runs as a slave accelerator, connected over an AXI interface. Because of the nature of DSP, Flex Logix also can supply a dual-ported, streaming interface to the InferX tile. Because most DSP customers already know which algorithms their designs need, the overall SoC design cycle is currently much shorter for designs that use InferX DSP, compared to InferX AI, said Tate.

Leave a Reply

featured blogs
Mar 28, 2024
The difference between Olympic glory and missing out on the podium is often measured in mere fractions of a second, highlighting the pivotal role of timing in sports. But what's the chronometric secret to those photo finishes and record-breaking feats? In this comprehens...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

Electrical Connectors for Hermetically Sealed Applications
Sponsored by Mouser Electronics and Bel
Many hermetic chambers today require electrical pathways to provide internal equipment with power, data or signals, or to receive data and signals from equipment within the chamber. In this episode of Chalk Talk, Amelia Dalton and Brad Taras from Cinch Connectivity Solutions explore the role that seals and connectors play in the performance of hermetic chambers. They examine the methodologies to determine hermetic seal leaks, the benefits of epoxy hermetic seals, and how Cinch Connectivity’s epoxy-based seals and hermetic connectors can add value to your next design.
Aug 22, 2023
26,363 views