feature article
Subscribe Now

Flex Logix Fires Second Salvo

Challenging FPGAs on AI Applications

For decades now, FPGA companies have struggled to overcome their de-facto positioning as “ASIC alternatives.” Of course, FPGAs are great for prototyping your design, or for getting something into production much earlier than we could with a custom chip design. But, eventually, for designs that go into volume production, there comes a time when it’s worth designing an ASIC or ASSP to do the same thing, yielding better performance, lower power consumption, smaller area, and lower unit cost. This is bad for FPGA companies because just when a design win should turn into higher volume and long-term revenue, the FPGA gets dropped off the board and replaced with a custom device.

For FPGA providers, the solution for this sad situation involves finding applications where reprogrammability itself is a core requirement. Software-defined radio, for example, is such a “killer app.” The application requires programmable fabric so modems can be created, loaded, and dispatched on the fly. It doesn’t make sense to replace FPGA-based logic with hardened gates, because in-system reprogrammability is a fundamental part of the application. The result? The FPGA company gets to keep the socket even when production volume rises, without fear of being yanked in favor of a custom chip.

Recently, neural network inferencing has emerged as another of these FPGA “killer apps.” FPGA LUT fabric (along with the fixed-point DSP resources) delivers spectacular performance/power characteristics on neural network inferencing, and in-system reprogrammability is a must. On top of that, there is an enormous number of applications that can take advantage of AI/neural network technology. For FPGA companies, it could be the proverbial bird’s nest on the ground – a high-volume, high-value key role in a wide variety of new applications. You can almost hear the champagne corks popping at FPGA headquarters…

“Not quite so fast, there, cowboys.” says Flex Logix.

As we’ve discussed before, Flex Logix provides IP that allows designers to put FPGA LUT fabric on custom chips. Recently, they announced that their latest-generation EFLX cores allow embedded FPGA arrays up to 122.5K LUTs to be built on TSMC 16FF+ and 16FFC processes. This means that you can bring the benefits of FPGA-like reprogrammability to custom chips for applications such as neural network inferencing, for example. Flex Logix fabric comes in 2.5K LUT blocks, which can be arrayed to build the desired size – from 2.5K up to 122.5K.

What does this mean for the FPGA companies looking for that long-term socket? It means that programmability is no longer a competitive moat against ASIC/ASSP incursion. Design teams can build custom chips with all the benefits of reprogrammability rather than committing to off-the-shelf FPGAs for long-term, high-volume production. This doesn’t cut conventional FPGAs out of the picture, but it does put a damper on their aspirations to become the long-term, unchallenged solution for applications that want to cost-reduce for high volumes.

This is the second generation of the Flex Logix IP cores. These blocks are based on 6-input LUTs, which can also be configured as dual 5-input LUTs. These are similar to the logic cells used by mainstream FPGAs. Each block can be either a “logic” block or a “DSP” block, where the DSP version replaces some LUTs with 40 22x22bit MACs. Each IP block uses CMOS I/Os to talk to the rest of the chip, so you gain considerable bandwidth and power efficiency compared with using a separate, stand-alone FPGA along with your custom device. Flex Logix uses a proprietary, high-density routing architecture, which they claim has been further improved with this second generation, giving a very respectable 1.0 mm2 footprint with only six routing layers for a 2.5K LUT block on TSMC 16FF+/16FFC.

Flex Logix also says the new architecture has further improved on its novel interconnect to give higher performance for larger arrays. They have also structured the MAC blocks to be pipelined 10 in a row, which allows local interconnect to be used for highly chained datapaths such as FIR filters, thus improving performance and reducing the requirement for external routing resources. A new test mode has been added for faster test times, as well as additional miscellaneous DFT enhancements. For high-reliability (particularly aerospace) designs, a “readback” feature has been added that allows the configuration to be checked and scrubbed periodically in case radiation-induced single-event upsets or other environmental “soft” errors have damaged the configuration, allowing a quick reconfigure when an error is detected.

One of the key barriers to embedded FPGAs has always been design tools for the FPGA fabric. While plopping down a bunch of LUTs is a fairly straightforward task, providing and supporting the complex set of design tools for synthesis, place-and-route, and bitstream generation is a much more demanding undertaking. The Flex Logix EFLX compiler addresses this need, and it is available for a no-cost evaluation. Speaking of evaluation, the new cores are being fabricated now, and evaluation boards will be available under NDA to customers. Flex Logix proves all new cores in silicon themselves to assure that customer experience with integration is smooth.

The key question is, will Flex Logix get traction with the new cores in strategic applications like AI? They’re off to a good start. This week, the company announced that its embedded array is part of a next-generation deep learning chip being developed at Harvard by the research group of Professors David Brooks and Gu-Yeon Wei at Harvard’s John A. Paulson School of Engineering and Applied Sciences. The device has already gone to tape-out and is going into fabrication – giving an early view into the practicality of EFLX integration for AI applications.

Flex Logix is only a couple years old, but the company has already taken the embedded FPGA idea farther than any previous attempt we are aware of. Their tile-based structure gives chip designers a lot of flexibility in balancing the FPGA fabric with the other resources on the chip for their particular application. We have not yet used or talked to customers who use the tool suite, which is likely the critical make-or-break factor for widespread success of the technology. It will be interesting to watch.

Leave a Reply

featured blogs
Oct 26, 2020
Do you have a gadget or gizmo that uses sensors in an ingenious or frivolous way? If so, claim your 15 minutes of fame at the virtual Sensors Innovation Fall Week event....
Oct 26, 2020
Last week was the Linley Group's Fall Processor Conference. The conference opened, as usual, with Linley Gwenap's overview of the processor market (both silicon and IP). His opening keynote... [[ Click on the title to access the full blog on the Cadence Community s...
Oct 23, 2020
Processing a component onto a PCB used to be fairly straightforward. Through-hole products, or a single or double row surface mount with a larger centerline rarely offer unique challenges obtaining a proper solder joint. However, as electronics continue to get smaller and con...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

See Inuitive’s NU4000 3D imaging and vision processor in action. The SoC supports high-quality 3D depth processor engine, SLAM accelerators, computer vision, and deep learning by integrating Synopsys ARC EV processor. In this demo, the NU4000 demonstrates simultaneous 3D sensing, SLAM and CNN functionality by mapping out its environment and localizing the sensor while identifying the objects within it. For more information, visit inuitive-tech.com.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

Fundamentals of Precision ADC Noise Analysis

Sponsored by Texas Instruments

Build your knowledge of noise performance with high-resolution delta-sigma ADCs. This e-book covers types of ADC noise, how other components contribute noise to the system, and how these noise sources interact with each other.

Click here to download the whitepaper

Featured Chalk Talk

Mom, I Have a Digital Twin? Now You Tell Me?

Sponsored by Cadence Design Systems

Today, one engineer’s “system” is another engineer’s “component.” The complexity of system-level design has skyrocketed with the new wave of intelligent systems. In this world, optimizing electronic system designs requires digital twins, shifting left, virtual platforms, and emulation to sort everything out. In this episode of Chalk Talk, Amelia Dalton chats with Frank Schirrmeister of Cadence Design Systems about system-level optimization.

Click here for more information