feature article
Subscribe Now

Who Will Win AI at the Edge?

Low-power Options Get Traction

We’ve written a lot about AI in the cloud, and we’ve discussed data center solutions such as GPUs, high-end FPGAs, and dedicated AI chips such as Intel’s Nervana. For many applications, training and inferencing CNNs and other AI-based systems in cloud data centers is the only way to get the compute power required to crunch the vast data sets and complex models. But, for perhaps an even larger set of applications, cloud-based inference is not practical. We may need latency that cannot be achieved by shipping data upstream to be analyzed. We may not have the ability to maintain a full-time network connection. We may have any of a number of other factors that preclude sending data to a data center and waiting for results to return.

For these applications, the only practical solution is to do the AI inferencing at the edge, right where the data is collected. But AI at the edge brings a host of challenges. The computation required for inferencing of even modest-sized models is enormous. Conventional applications processors cannot come close to the performance required, and their architecture is far from ideal for neural network inferencing tasks. GPUs are power hungry and expensive. Most edge devices are heavily constrained on cost, power, and form factor. Throwing in a real-time latency requirement brings the problem almost to the realm of “unsolvable” with current technology.

Ah, that “almost unsolvable” is the stuff fortunes are made of.

The challenge of conquering AI at the edge has attracted droves of brilliant minds to the task. Countless startups are slinging silicon at the problem, conjuring up novel processing architectures adapted to the particular requirements of CNN inferencing. What are those requirements? For starters, we need a lot of multiplication. The C in CNN stands for “convolution” and, as math nerds know, that means we are likely to be multiplying a lot of matrices.

But, for training and inference, the types of values we are multiplying are not the same. During training, we need floating point math, which is the reason GPUs have taken such a hold in the data center where most CNN models are created. However, once training is done and those coefficients are established, it is possible to do inferencing with much simpler values – fixed point numbers, sometimes as narrow as a single bit. If you consider the hardware and energy costs of doing full single- or double-precision floating point multiplication versus multiplication of small fixed-point quantities, the implications are obvious: custom hardware that can execute massive numbers of correct-width, fixed-point hardware multiplications in parallel could generate orders of magnitude better performance and efficiency on CNN inferencing than conventional CPUs or GPUs.

When it comes to creating custom hardware, there are two primary options, of course. If the hardware is static (and we have the buget, time, and talent) we create an ASIC. If the hardware configuration needs to change, we use FPGAs. If we’re hip and cool and trying to capture the best of both worlds, we may create an ASIC with embedded FPGA fabric (eFPGA). Custom ASIC, it turns out, is a good solution for only a tiny fraction of the AI edge applications out there. Seldom are we given a static, unchanging CNN model that we want to use forever. More often, we want an SoC optimized for every part of our design except the CNN inferencing part, and then we want a block (or chip) with FPGA fabric or another type of optimized neural processing unit or tensor processing CNN accelerator.

If you’re designing an SoC, have endpoint AI processing requirements, and need on-chip acceleration, there are a number of options available. Cadence Design Systems, for example, has developed versions of their Tensilica processor IP. Cadence’s recently-announced Tensilica DNA-100 processor IP capitalizes on the sparsity of many CNN models by avoiding the repeated loading and mulitplication of zeroes inherent in the flow of most accelerator architectures. The company claims this yields much more efficient inference computation than other acceleration solutions with similar multiply-accumulate (MAC) array sizes.

Synopsys has an entire portfolio of DesignWare IP aimed at the edge inferencing market, including vision-specific architectures, memory and datapath customization solutions, and customized versions of the venerable ARC processor architecture. Synopsys appears to cater more to the “roll-your-own” crowd when it comes to edge AI, which may result in more application-optimized systems, albeit with a steeper design and learning curve on the hardware architecture side.

FPGA and eFPGA companies appear to be committing heavy engineering to the edge inference problem as well. Mainstream FPGA companies Xilinx and Intel have thus-far focused most of their attention on the high-end versions of FPGA acceleration, with bigger, more expensive chips aimed at the data center or other power-rich applications. This has given niche FPGA players like QuickLogic, Lattice Semiconductor, and Microchip/Microsemi an opportunity to capitalize on their low-power FPGA fabric technology to create various low-cost, low-power FPGA solutions, and QuickLogic and Lattice have both also joined companies like Achronix and FlexLogix in the eFPGA movement, offering IP blocks that put FPGA fabric in your ASIC and pre-engineered FPGA IP and software stacks to facilitate the creation of application-specific AI accelerators with that fabric. Just last month, QuickLogic acquired SensiML corporation, particularly for the SensiML Analytics Toolkit, which provides a streamlined flow for developing AI-based pattern-matching sensor algorithms optimized for ultra-low power consumption.

While the hardware and hardware IP suppliers battle each other with various claims on their combination of low power, low cost, tiny form factor, and inference throughput and latency, perhaps the bigger challenge faced by the industry is the infrastructure for creating and optimizing AI models in the first place. While languages and tool flows continue to evolve, the community of truly skilled AI experts with the cross-over capability to optimize those models for the wide variety of custom hardware environments competing for the crown in edge-based AI is almost vanishingly small. It is likely that the hardware architectures that win the battle may be the ones with the smoothest development flow, rather than the ones with the most compelling data sheets. We have seen time and again that novel hardware alone is not enough to dominate a market. The winning solution is often one that lacks luster in optimality, but excels in usability. It will definitely be interesting to watch.

Leave a Reply

featured blogs
Sep 19, 2023
What's new with the latest Bluetooth mesh specification? Explore mesh 1.1 features that improve security and network efficiency, reduce power, and more....
Sep 20, 2023
The newest version of Fine Marine offers critical enhancements that improve solver performances and sharpen the C-Wizard's capabilities even further. Check out the highlights: γ-ReθTransition Model and Extension for Crossflow Modeling We have boosted our modeling capabi...
Sep 20, 2023
ESD protection analysis is a critical step in the IC design process; see how our full-chip PrimeESD tool accelerates ESD simulation and violation reporting.The post New Unified Electrostatic Reliability Analysis Solution Has Your Chip Covered appeared first on Chip Design...
Sep 10, 2023
A young girl's autobiography describing growing up alongside the creation of the state of Israel...

Featured Video

Chiplet Architecture Accelerates Delivery of Industry-Leading Intel® FPGA Features and Capabilities

Sponsored by Intel

With each generation, packing millions of transistors onto shrinking dies gets more challenging. But we are continuing to change the game with advanced, targeted FPGAs for your needs. In this video, you’ll discover how Intel®’s chiplet-based approach to FPGAs delivers the latest capabilities faster than ever. Find out how we deliver on the promise of Moore’s law and push the boundaries with future innovations such as pathfinding options for chip-to-chip optical communication, exploring new ways to deliver better AI, and adopting UCIe standards in our next-generation FPGAs.

To learn more about chiplet architecture in Intel FPGA devices visit https://intel.ly/45B65Ij

featured webinar

Secure the Future with Post-Quantum Cryptography on eFPGAs

Sponsored by QuickLogic

With the emergence of the quantum threat, the need for robust cybersecurity measures has never been more critical. Join us for an enlightening webinar that delves into the future of data protection with Xiphera's groundbreaking Post-Quantum Cryptography and QuickLogic's cutting-edge eFPGA technology. Join the webinar today and learn about the quantum threat and how it affects cybersecurity, Post-Quantum Cryptography (PQC) and how it works, how eFPGA can be used to gain maximum protection with PQC and the importance of PQC for digital design engineers, system security architects, and developers

Don't miss this timely webinar. Sign up today.

featured chalk talk

Enabling IoT with DECT NR+, the Non-Cellular 5G Standard
In the ever-expanding IoT market, there is a growing need for private, low cost networks. In this episode of Chalk Talk, Amelia Dalton and Heidi Sollie from Nordic Semiconductor explore the details of DECT NR+, the world’s first non-cellular 5G technology standard. They investigate how this self-healing, decentralized, autonomous mesh network can help solve a variety of IoT connectivity issues and how Nordic is helping designers take advantage of DECT NR+ with their nRF91 System-in-Package family.
Aug 17, 2023
4,606 views