We are all focusing a lot of attention on neural network inference these days. There is lengthy debate about the relative merits of GPUs, high-end FPGAs, and other specialized solutions for generating the most and best inferencing per coloumb. Most of that centers around data center or high-end edge designs. But the real volume for inferencing chips will clearly be in cost-, power-, and space-constrained edge systems, often running on batteries, and usually far out of the realm of conventional GPUs and FPGAs. Companies like NVidia, Intel, and Xilinx are engaged in all-out war in that arena, with Xilinx announcing a new ultra-high-performance accelerator board just this week (more on that in a future article).
In this edge inferencing space, the competition is completely different, and somewhat more interesting. Lattice Semiconductor – who has carved out a solid niche in ultra-low-power programmable logic devices – has just announced a set of enhancements and upgrades to their sensAI stack offering. sensAI stack includes modular hardware platforms, neural network IP cores, software tools, reference designs, and custom design services from eco-system partners. The company says sensAI stack is aimed at “deployment of always-on, on-device AI into a range of edge applications including mobile, smart home, smart city, smart factory, and smart car products.” In particular, they are aiming at applications that can use an inferencing engine in the 1mW-1W range that costs $1-$10 USD.
We wrote about sensAI stack back when it was first announced, and this latest set of enhancements brings new capability and increased performance to the table. Specifically, Lattice is adding new IP cores, reference designs, demos, and hardware development kits. Or, in other words, exactly the kind of stuff that makes the “stack” concept interesting and useful in the first place.
First up is a new convolutional neural network (CNN) compact accelerator IP block for the tiny, ultra-low-power iCE40 UltraPlus FPGAs. The new IP supports 16-bit and 1-bit quantization for improved performance, power, and accuracy tradeoffs. Choosing the right bit-width can have a dramatic impact on the accuracy, power, and (to a lesser degree) performance of CNN implementations. Typically, those tradeoffs are done at implementation time, and the flexibility of FPGA-based inferencing engines really shines because of their ability to support fine-grained tuning of the logic for optimal quantization.
The iCE40 UltraPlus family would be the solution of choice for (for example) an “always-on processor that detects key- phrases or objects, and wakes up a high performance AP SoC / ASIC for further analytics only when required, reducing overall system power consumption.” Lattice originally acquired the iCE technology with their acquisition of SiliconBlue a few years ago, and the current devices are unique in the industry, with tiny form factors and absurdly low power consumption that are unmatched by any other type of FPGA. Lattice has been quite successful with these devices in mobile applications such as smartphones, and it is now opening up new markets for the technology with sensAI stack.
Also new is enhanced CNN accelerator IP for the low-power, low-cost ECP5 FPGAs. ECP5 is the second tier in Lattice’s offering and is closer to what one would think of as a “conventional” low-end FPGA in terms of cost, performance, and power consumption. The company claims up to a 2x increase in DRAM memory bandwidth with the updated IP. This should give a significant performance improvement, particularly in smaller devices. With ECP5 doing neural network inference acceleration, we get a vastly increased range of IO flexibility, which facilitates interfacing to a wide variety of devices such as sensors or low-end MCUs.
Of course, you can’t design in what you can’t develop, and Lattice is rolling out a whole new array of software upgrades and development platforms. An updated neural network compiler tool improves ease-of-use, giving both Caffe and TensorFlow support for CNN implementation on iCE40 UltraPlus FPGAs. New kits/dev boards include the Himax HM01B0 UPduino Shield which (as one might guess from the name) is an Arduino form factor kit based on the UPduino 2.0 board for implementing AI with Lattice devices (specifially the iCE40 UltraPlus FPGA). It includes the Himax HM01B0 low power image sensor module and two I2S microphones, in order to take advantage of vision and sound as sensory inputs.
With partner DPControl, Lattice is also announcing the iCEVision Board, which is a vision-focused development using the iCE40 UltraPlus FPGA. It allows users to interface to image sensors using standard sensor connectors, enabling CNN-driven smart vision. Lattice says that “multiple interface connectors allow users to quickly implement solutions and confirm that their designs work as expected.” The iCEVision board is compatible with the most common camera interfaces including ArduCam, CSI, and PMOD.
On top of those, the company is also releasing new reference/demo designs, including new human presence detection and hand gesture recognition reference designs and demos. Handling this kind of sensor data with FPGAs versus MCUs should result in dramatic power savings and performance improvements and should enable many systems that would not otherwise be able to offer “always-on” detection capabilities for presence and gesture recognition.
Lattice is one of the few vendors actually delivering and shipping edge-class AI solutions now. Most of the other announcements we’ve seen are work-in-progress, so Lattice seems to have a decent head start in capturing this rapidly expanding market. The combination of Lattice’s low-cost, low-power FPGA technology, along with the tools, boards, IP, and other contributions of their partner ecosystem, should make sensAI a force in the race to enable AI in new IoT edge applications.