feature article
Subscribe Now

Who Will Win AI at the Edge?

Low-power Options Get Traction

We’ve written a lot about AI in the cloud, and we’ve discussed data center solutions such as GPUs, high-end FPGAs, and dedicated AI chips such as Intel’s Nervana. For many applications, training and inferencing CNNs and other AI-based systems in cloud data centers is the only way to get the compute power required to crunch the vast data sets and complex models. But, for perhaps an even larger set of applications, cloud-based inference is not practical. We may need latency that cannot be achieved by shipping data upstream to be analyzed. We may not have the ability to maintain a full-time network connection. We may have any of a number of other factors that preclude sending data to a data center and waiting for results to return.

For these applications, the only practical solution is to do the AI inferencing at the edge, right where the data is collected. But AI at the edge brings a host of challenges. The computation required for inferencing of even modest-sized models is enormous. Conventional applications processors cannot come close to the performance required, and their architecture is far from ideal for neural network inferencing tasks. GPUs are power hungry and expensive. Most edge devices are heavily constrained on cost, power, and form factor. Throwing in a real-time latency requirement brings the problem almost to the realm of “unsolvable” with current technology.

Ah, that “almost unsolvable” is the stuff fortunes are made of.

The challenge of conquering AI at the edge has attracted droves of brilliant minds to the task. Countless startups are slinging silicon at the problem, conjuring up novel processing architectures adapted to the particular requirements of CNN inferencing. What are those requirements? For starters, we need a lot of multiplication. The C in CNN stands for “convolution” and, as math nerds know, that means we are likely to be multiplying a lot of matrices.

But, for training and inference, the types of values we are multiplying are not the same. During training, we need floating point math, which is the reason GPUs have taken such a hold in the data center where most CNN models are created. However, once training is done and those coefficients are established, it is possible to do inferencing with much simpler values – fixed point numbers, sometimes as narrow as a single bit. If you consider the hardware and energy costs of doing full single- or double-precision floating point multiplication versus multiplication of small fixed-point quantities, the implications are obvious: custom hardware that can execute massive numbers of correct-width, fixed-point hardware multiplications in parallel could generate orders of magnitude better performance and efficiency on CNN inferencing than conventional CPUs or GPUs.

When it comes to creating custom hardware, there are two primary options, of course. If the hardware is static (and we have the buget, time, and talent) we create an ASIC. If the hardware configuration needs to change, we use FPGAs. If we’re hip and cool and trying to capture the best of both worlds, we may create an ASIC with embedded FPGA fabric (eFPGA). Custom ASIC, it turns out, is a good solution for only a tiny fraction of the AI edge applications out there. Seldom are we given a static, unchanging CNN model that we want to use forever. More often, we want an SoC optimized for every part of our design except the CNN inferencing part, and then we want a block (or chip) with FPGA fabric or another type of optimized neural processing unit or tensor processing CNN accelerator.

If you’re designing an SoC, have endpoint AI processing requirements, and need on-chip acceleration, there are a number of options available. Cadence Design Systems, for example, has developed versions of their Tensilica processor IP. Cadence’s recently-announced Tensilica DNA-100 processor IP capitalizes on the sparsity of many CNN models by avoiding the repeated loading and mulitplication of zeroes inherent in the flow of most accelerator architectures. The company claims this yields much more efficient inference computation than other acceleration solutions with similar multiply-accumulate (MAC) array sizes.

Synopsys has an entire portfolio of DesignWare IP aimed at the edge inferencing market, including vision-specific architectures, memory and datapath customization solutions, and customized versions of the venerable ARC processor architecture. Synopsys appears to cater more to the “roll-your-own” crowd when it comes to edge AI, which may result in more application-optimized systems, albeit with a steeper design and learning curve on the hardware architecture side.

FPGA and eFPGA companies appear to be committing heavy engineering to the edge inference problem as well. Mainstream FPGA companies Xilinx and Intel have thus-far focused most of their attention on the high-end versions of FPGA acceleration, with bigger, more expensive chips aimed at the data center or other power-rich applications. This has given niche FPGA players like QuickLogic, Lattice Semiconductor, and Microchip/Microsemi an opportunity to capitalize on their low-power FPGA fabric technology to create various low-cost, low-power FPGA solutions, and QuickLogic and Lattice have both also joined companies like Achronix and FlexLogix in the eFPGA movement, offering IP blocks that put FPGA fabric in your ASIC and pre-engineered FPGA IP and software stacks to facilitate the creation of application-specific AI accelerators with that fabric. Just last month, QuickLogic acquired SensiML corporation, particularly for the SensiML Analytics Toolkit, which provides a streamlined flow for developing AI-based pattern-matching sensor algorithms optimized for ultra-low power consumption.

While the hardware and hardware IP suppliers battle each other with various claims on their combination of low power, low cost, tiny form factor, and inference throughput and latency, perhaps the bigger challenge faced by the industry is the infrastructure for creating and optimizing AI models in the first place. While languages and tool flows continue to evolve, the community of truly skilled AI experts with the cross-over capability to optimize those models for the wide variety of custom hardware environments competing for the crown in edge-based AI is almost vanishingly small. It is likely that the hardware architectures that win the battle may be the ones with the smoothest development flow, rather than the ones with the most compelling data sheets. We have seen time and again that novel hardware alone is not enough to dominate a market. The winning solution is often one that lacks luster in optimality, but excels in usability. It will definitely be interesting to watch.

Leave a Reply

featured blogs
May 6, 2026
Hollywood has struck gold with The Lord of the Rings and Dune'”so which sci-fi and fantasy books should filmmakers tackle next?...

featured paper

Want early design analysis without simulation?

Sponsored by Siemens Digital Industries Software

Traditional verification methods are failing today's complex IC designs, which require a proactive, early-stage analysis approach. A shift-left methodology addresses IP block integration challenges and the limitations of traditional simulation and ERC tools. Insight Analyzer detects hard-to-find leakage issues across power domains, enabling early analysis without full simulation. Identify inefficiencies earlier to reduce rework, improve reliability, and enhance power performance.

Click to read more!

featured chalk talk

Designing Scalable IoT Mesh Networks with Digi XBee® for Wi-SUN
Sponsored by Mouser Electronics and Digi and Silicon Labs
In this episode of Chalk Talk, Quinn Jones from Digi, Chad Steider from Silicon Labs and Amelia Dalton explore how Wi-SUN Micro-Mesh can reduce cost and simplify deployment for your next IoT mesh network. They also investigate the benefits that Digi XBee solutions bring to these types of networks and how you can jump start your next IoT mesh network design with Silicon Labs and Digi.
May 4, 2026
23,786 views