feature article
Subscribe Now

Flex Logix configurable hardware IP for AI and DSP workloads fuses FPGAs, tensor units, and software

Today, we’re going to talk about AI, DSP, FPGAs, IP, and SoCs. Normally, these things don’t all go together. Certainly, FPGAs have been used to implement AI and DSP algorithms, although AI and DSP algorithms generally involve different sorts of computations. (See “A Brief History of the Single-Chip DSP, Part II .”) DSP designs have largely stayed with FPGA implementations, thanks to the bounty of multiplier/accumulators (MACs) they deliver, while AI training has migrated to GPUs. These days, AI inference has been moving to multiple hardware implementations including augmented CPUs, dedicated chips, purpose-built AI engines for SoCs, and even FPGAs. Now, Flex Logix has introduced yet another alternative: configurable hardware IP that’s specifically designed to execute AI and DSP workloads.

Founded in 2014, Flex Logix is no newcomer to configurable hardware IP. The company has offered EFLEX embedded FPGA (eFPGA) IP tiles and software for nearly a decade. According to Geoff Tate, Flex Logix co-founder and CEO, the company now has several defense-related customers and more than half a dozen commercial customers with 23 different chips in high-volume production that incorporate EFLEX eFPGAs. Examples of such devices range from a high-end 5G MIMO SoC from an unnamed customer that integrates a 100K-LUT (lookup table) eFPGA along with multiple transmit and receive digital front ends for the MIMO antenna interface to a new line of small and inexpensive Renesas ForgeFPGAs with 1K to 4K LUTs that can cost less than 50 cents in volume.


Figure 1: Small, inexpensive Renesas ForgeFPGAs are based on Flex Logix FPGA IP and the development tools are based on the tools supplied by Flex Logix. Image credit: Flex Logix

Renesas ForgeFPGAs trace their lineage to Silego devices, which my friend and colleague Max Maxfield has written about previously. (See “Not Much Happened, or Did It?” and “Renesas Announces Fabulous ForgeFPGA Family.”) Dialog Semiconductor bought Silego in 2017 and was in turn purchased by Renesas in 2021, which is how Renesas became an FPGA vendor. Little fish get eaten by bigger fish, which in turn get eaten by even bigger fish.

The Renesas FPGAs provide a version of the Flex Logix FPGA software directly to developers for configuring ForceFlex devices, which indicates that Flex Logix can provide usable software directly to system developers. That’s an important capability when entering the AI market, where you want to allow data scientists and programmers with no hardware development background to use your products easily.

In many ways, these new Flex Logix InferX hardware tiles trace their lineage back to the company’s eFPGA tiles. InferX IP tiles are hardened IP, supplied by Flex Logix to a customer for a specific IC manufacturing node at a specific foundry. In this way, InferX IP is just like the company’s EFLEX FPGA IP. In the case of the InferX IP, Flex Logix is initially targeting TSMC’s N5 5nm process node, but the company says that any FinFET process is a candidate for this IP.

Further, AI and DSP algorithms execute a large number of multiply/accumulate operations. FPGAs, including the Flex Logix eFPGA tiles, generally provide many, many multiplier/accumulators to execute all of those operations in real time, but these MACs are embedded in the FPGA fabric like the raisins in my grandmother’s delicious Friday-night rice pudding recipe, which were relatively few and far between. In an FPGA, hardened MACs are surrounded by the FPGA’s interconnect fabric to enable the creation of any sort of execution pipeline. However, AI and DSP algorithms don’t need that sort of flexibility—their calculation pipelines are far more regular—so the silicon overhead of an FPGA’s routing structures are superfluous and can be removed. That silicon overhead is significant and can be as much as 80% of the silicon in the area in and around the MACs according to Tate.

Instead, Flex Logix took its MAC design, enhanced it for AI and DSP operations, and created a hardwired tensor unit that interconnects a large number of MACs with a hardened interconnect, just the way an SoC would be designed. The resulting InferX hardware IP tile combines four of these tensor units with individual L2 SRAM caches and a smallish eFPGA that’s used to configure and interconnect the tensor units, as shown in Figure 2 below.


Figure 2: The Flex Logix InferX hardware IP tile combines four of these tensor units with a small eFPGA used to configure and interconnect the tensor units. Image Credit: Flex Logix

According to Flex Logix, the computational capabilities of these InferX tiles scale more or less linearly, so that the SoC development team can incorporate one tile, four tiles, eight tiles, and possibly more to meet system processing requirements. Figure 3 below illustrates the performance capabilities of 1-, 2-, 4-, and 8-tile InferX instantiations for some commonly used AI models.


Figure 3: SoC developers can instantiate multiple InferX tiles to achieve the performance level required by their application. (Note: IPS = Inferences Per Second.) Image credit: Flex Logix

As Figure 3 shows, a single InferX tile can implement multiple AI image-recognition models with varying degrees of image resolution. A 2-tile InferX instantiation roughly doubles performance compared to a 1-tile implementation, and 4- and 8-tile instantiations further boost performance. As can be seen from Figure 3, sometimes the performance boost is linear, sometimes it’s superlinear; and sometimes it’s sublinear. The performance gained by increasing the number of InterX tiles depends on memory capacity (one or two LPDDR5 SDRAMs), I/O saturation, and other system-level bottlenecks.

The required subsystem performance varies by end application, of course. A race car traveling at 200 MPH has far more demanding image-processing performance requirements in terms of inferences per second (IPS) than does a delivery robot moving at a few miles per hour. Note that the performance specifications shown in Figure 3 are for images processed per second (IPS) rather than the TOPS (trillions of operations per second) numbers other vendors use as a proxy for actual AI performance. That’s because InferX tiles are designed and configured for dedicated, real-time tasks in embedded equipment, so they’re scaled and programmed/configured accordingly. Tate says that if the prospective customer starts asking about TOPS, then the salesperson is probably talking to the wrong prospect. According to Tate, it’s the performance for specific AI model(s) that’s important for SoCs that are candidates for InferX tiles.

The software that Flex Logix provides to InferX customers does not expose the FPGA configuration or tensor engine programming directly to developers. Instead, the developer starts with an AI model trained within a machine learning framework such as Caffe, TensorFlow, PyTorch, or mxnet. These trained AI models all employ floating-point arithmetic, which is not needed for inference, so the first step in converting the AI model for use with the InferX tile is a model quantizer, which converts the floating-point values into 8-bit integers. After that, a graph compiler and operator compiler map the model onto the InferX architecture to emit an InferX run-time configuration, which the company calls “Softlogic,” and driver software with API calls. Because the InferX hardware IP is configurable, AI models can be swapped out for new ones in microseconds, in a manner similar to the use of software overlays for microprocessors. Taken together, the company collectively calls the InferX tile and AI software “InferX AI.”

For now, AI applications may get the lion’s share of press and hype, but Flex Logix has found that the primary use for its InferX tiles at the moment is for DSP algorithms, which also need to execute a lot of MACs. So, the company has created a similar package of InferX tiles and DSP software, conveniently called “InferX DSP.” According to Tate, the primary market for InferX DSP is for the replacement of Xilinx devices, specifically Xilinx Versal FPGAs and Zynq SoCs and MPSoCs, which combine Arm CPUs with Xilinx FPGA fabric. If a company is already going to the effort and expense of designing an SoC to replace these off-the-shelf Xilinx devices, there’s a potential cost savings to be had by incorporating one or more InferX tiles into the SoC to perform the MACs.

Flex Logix provides high-level software that implements popular DSP algorithms such as FFTs, FIR filters, and complex matrix inversions. For both InferX AI and InferX DSP, the resulting IP block runs as a slave accelerator, connected over an AXI interface. Because of the nature of DSP, Flex Logix also can supply a dual-ported, streaming interface to the InferX tile. Because most DSP customers already know which algorithms their designs need, the overall SoC design cycle is currently much shorter for designs that use InferX DSP, compared to InferX AI, said Tate.

Leave a Reply

featured blogs
Sep 19, 2023
What's new with the latest Bluetooth mesh specification? Explore mesh 1.1 features that improve security and network efficiency, reduce power, and more....
Sep 20, 2023
Qualcomm FastConnect Software Suite for XR empowers OEMs with system-level optimizations for truly wireless XR....
Sep 20, 2023
The newest version of Fine Marine offers critical enhancements that improve solver performances and sharpen the C-Wizard's capabilities even further. Check out the highlights: γ-ReθTransition Model and Extension for Crossflow Modeling We have boosted our modeling capabi...
Sep 20, 2023
ESD protection analysis is a critical step in the IC design process; see how our full-chip PrimeESD tool accelerates ESD simulation and violation reporting.The post New Unified Electrostatic Reliability Analysis Solution Has Your Chip Covered appeared first on Chip Design...
Sep 10, 2023
A young girl's autobiography describing growing up alongside the creation of the state of Israel...

featured video

TDK PowerHap Piezo Actuators for Ideal Haptic Feedback

Sponsored by TDK

The PowerHap product line features high acceleration and large forces in a very compact design, coupled with a short response time. TDK’s piezo actuators also offers good sensing functionality by using the inverse piezo effect. Typical applications for the include automotive displays, smartphones and tablet.

Click here for more information about PowerHap Piezo Actuators

featured paper

Intel's Chiplet Leadership Delivers Industry-Leading Capabilities at an Accelerated Pace

Sponsored by Intel

We're proud of our long history of rapid innovation in #FPGA development. With the help of Intel's Embedded Multi-Die Interconnect Bridge (EMIB), we’ve been able to advance our FPGAs at breakneck speed. In this blog, Intel’s Deepali Trehan charts the incredible history of our chiplet technology advancement from 2011 to today, and the many advantages of Intel's programmable logic devices, including the flexibility to combine a variety of IP from different process nodes and foundries, quicker time-to-market for new technologies and the ability to build higher-capacity semiconductors

To learn more about chiplet architecture in Intel FPGA devices visit: https://intel.ly/47JKL5h

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023