feature article
Subscribe Now

FlexLogix Accelerates Edge Inference

New InferX X1 Chips Show Promise

Let’s start with one hard fact. There are a lot of companies developing new AI inference processors right now, and most of them won’t survive. We have heard reports that as many as 80 startups are currently funded to some level, and all of them are working hard on building better mousetraps, or at least building better specialized processors that could execute neural network models for identifying mice with minimal latency and power. Conventional processors are not very efficient at AI inference, which requires convolution operations with billions, or even trillions, of multiplications to occur as fast as possible, and with minimal power consumption.

And the field isn’t limited to startups, either. Intel just announced that their new second-generation Xeon processors have special extensions called “DL Boost” that accelerate performance of AI inference as much as 30x over the previous Xeons, and Intel has also acquired at least three companies – Movidius, Nervana, and, yep, Altera – who offer acceleration capabilities for AI inference. NVidia has made a substantial business in recent years accelerating AI processing with their GPUs and their CUDA programming language. Xilinx has re-branded their company from FPGA-centric to acceleration-centric with a new “data center first” mantra – largely focused on accelerating AI tasks in the data center, and they recently acquired DeePhi Tech for their machine-learning IP. The Achronix Speedcore Gen4 eFPGA offering includes Machine Learning Processor (MLP) blocks that dramatically accelerate inference. And, the list goes on and on.

AI acceleration is critical in every level of the compute infrastructure, from cloud/data center all the way to battery-powered edge devices. The edge, of course, is in many ways the most demanding, as designs tend to be cost-, area-, and power-constrained, but with little compromise in the performance and latency required. Many applications cannot take advantage of heavy-iron data center AI acceleration because the round-trip latency and reliability are just not good enough. For these types of applications, the race is on to develop low-power, low-latency, low-cost AI inference processors – and there is no shortage of competitors.

This week, Flex Logix – a supplier of eFPGA IP – announced that they were jumping into the fray and going into the chip business as well, using their own IP to develop what they call the InferX X1 chip, which “delivers high throughput in edge applications with a single DRAM.” The company claims that the new devices will deliver “up to 10 times the throughput compared to existing inference edge ICs.” And, according to the company’s benchmark data, it appears they may have a solid contender.

Flex Logix used the interconnect technology from their eFPGA offering and added what they call nnMAX inferencing clusters to create the new devices. The company claims much higher throughput/watt than existing solutions, particularly at low batch sizes – common in

edge applications where there is typically only one camera/sensor. Flex Logix says the InferX X1’s performance at small batch sizes is “close to data center inference boards and is

optimized for large models that need 100s of billions of operations per image. For YOLOv3 real time object recognition, InferX X1 processes 12.7 frames/second of 2 megapixel

images at batch size = 1. Performance is roughly linear with image size, so frame rate

approximately doubles for a 1 megapixel image. This is with a single DRAM.”

InferX X1 will be offered as chips for edge devices and on half-height, half-length PCIe cards for edge servers and gateways, as well as via IP for companies developing their own ASICs/SoCs. The X1 chip consists of four tiles, and each tile includes 1,024 MAC units operating at 1.067 Ghz and 2MB of on-chip SRAM. Data feeds in via the ArrayLINX interconnect into the top of the nnMax cluster stack. The devices use partial reconfiguration to change the routing configuration layer-by-layer while pulling in new weights from DRAM. At speed, this takes only a microsecond.

Flex Logix says that they have hidden the complexity of the FPGA underpinnings from the user, and that end users will not need to learn or use any FPGA development tools.

Their software stack includes the “nnMAX Compiler,” which accepts models from TensorFlow Lite or ONNX, and the engine supports integer 8 and 16, and bfloat 16 data types. The software will automatically cast between these types to achieve the required precision and performance, and then automatically generate the required configuration bitstream. This means that models can be run at full precision to get up and running quickly, and optimized, quantized models can be swapped in later for better performance. Winograd transformations (with 12-bit accuracy) are used for integer 8 mode, giving more than 2x speedup for those operations. Numerics can be mixed in each layer, allowing optimization with integer types to be combined with selective use of floating point for accuracy.

The goal of the Flex Logix architecture is to minimize power consumption by reducing movement of data via the FPGA-style interconnect. With on-chip SRAM, the demands on off-chip DRAM memory are much smaller, with corresponding improvements in performance and power consumption. The new chips are fabricated in TSMC 16nm FinFET technology, which has outstanding static and dynamic power characteristics, further boosting X1’s prowess in edge applications. A single InferX X1 chip, with 4 tiles totaling 4,096 MACs, could hit a theoretical peak throughput of 8.5 TOPS. X1 chips are designed to be efficiently daisy-chained for larger models and higher throughput.

Flex Logix says nnMAX is in development now and will be available for integration in SoCs by Q3 2019. The stand-alone InferX X1 will tape-out in Q3 2019, and samples of chips and PCIe boards will be available shortly after.

The move to selling silicon is a leap for Flex Logix, who has made their living so far as a supplier of FPGA IP, but the exercise of developing and shipping their own silicon is likely to complement their IP business well. This also opens an entirely new class of customer for the company, as they have historically focused on organizations with the wherewithal to develop their own custom ICs, and shipping stand-alone chips addresses an entirely new and larger domain of end users. The company’s sales and support structures will have to evolve to handle the challenge of this new, larger customer pool.

While benchmark results are impressive and Flex Logix has a good reputation as a savvy startup, it is too early to predict any long-term winners in this fast-moving market. With so many companies developing AI inference solutions, we expect to be bombarded with competing claims over the coming months. It will be interesting to watch.

Leave a Reply

featured blogs
Nov 27, 2023
Qualcomm Technologies' SVP, Durga Malladi, talks about the current benefits, challenges, use cases and regulations surrounding artificial intelligence and how AI will evolve in the near future....
Nov 27, 2023
Employees of our Cadence Cork team recently volunteered to be part of an exciting journey with our learners from Age Action. Age Action is Ireland's leading advocacy organization for older people and aging. The organization provides practical programs to support older pe...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

TDK CLT32 power inductors for ADAS and AD power management

Sponsored by TDK

Review the top 3 FAQs (Frequently Asked Questions) regarding TDK’s CLT32 power inductors. Learn why these tiny power inductors address the most demanding reliability challenges of ADAS and AD power management.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines. Also join us for a webinar on the future of the Programmable Solution Group.

Register now: intel.com/leap

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023
22,652 views