feature article
Subscribe Now

Edge AI On The Cheap and Deep

Startup Deep Vision Emphasizes Programmability, Efficiency, Cost

There’s an old salesman’s adage that “confused customers never buy.” That’s why glossy sales brochures don’t have a lot of technical information, and why car salesmen don’t delve too deeply into features and benefits. Too much information can lead to analysis paralysis, and, while that might be fun for engineers, it’s bad for business. 

There’s a separate but related effect in engineering. A new technology might be interesting and impressive, but if you don’t immediately grasp how to use it, it won’t catch on. Sometimes the biggest hurdle to adoption is the learning curve. 

We saw this with the rise of DSPs (digital signal processors) in the 1990s: They were perfect for a range of applications, but few developers knew how to program one or where it was supposed to fit in their hardware block diagram. Consequently, DSP uptake was slow. The exception was at Texas Instruments, which spent a lot of corporate resources on software tools, training courses, and front-line technical support. DSP newcomers gravitated to TI, and the company converted a lot of early adopters into big customers. 

A similar strategy is playing out with AI and machine learning. We’re told that it’s the Next Big Thing, but few of us understand what it is, how it works, or where it fits in the block diagram. And confused engineers don’t design-in. 

One little company that hopes to knock down the Wall of Confusion is Deep Vision, a California startup that makes — and sells! — low-cost chips for AI-at-the-edge applications. At first blush, it’s a fabless chip company, but their real expertise is in the software tools. Deep Vision emphasizes accessibility and ease of use as much as ML performance and power efficiency. 

“Most customers don’t care what’s inside the chip,” says VP of Business Development Markus Levy. “It’s more about what you can do with it.” Indeed, most customers wouldn’t understand what’s inside since it’s probably their first AI/ML-related project. For the record, Deep Vision describes its ARA-1 chip as having a “polymorphic dataflow architecture” with a “neural-ISA core.” So now you know. 

Apart from its attitude toward tools, the company also takes a different approach to multitasking and context switching. Edge applications, they say, often continually switch between distinct ML models. For every video frame, a face-recognition app may start out searching for faces in a complex scene, then switch to identifying facial landmarks (eyes, noses, etc.), then switch again to determine an individual’s gaze, mood, drowsiness, or other characteristics. These all require different models, and switching models takes a lot of time on typical AI accelerators. 

Big datacenter applications don’t have this problem because they dedicate entire CPUs or GPUs to each model (or instances in cloud parlance). But an edge device doesn’t have that luxury. It must switch from one model to the other, while also keeping an eye on power, memory requirements, and cost. Deep Vision’s ARA-1 chip is designed to excel in those applications with its zero-overhead task switching capability.

The chip has eight identical 8/16-bit integer cores, called DLPs (deep learning processors). Each core has its own L1 cache, and they share a large 4MB L2. There’s also a control processor and a task manager, which manage the host communication and resource allocation. A 32-bit LPDDR4 interface handles external DRAM for models too large to fit in the on-chip memory. PCIe and USB interfaces provide interfaces to the host processor for passing commands and input data for the models (i.e., video frames). 

Like most AI/ML processors, ARA-1 is designed to be used in tandem with a conventional host processor (think ARM, x86, or RISC-V) running Linux. It’s a coprocessor or accelerator, designed to offload complex ML tasks from a CPU that’s not really designed for such workloads. (And that probably has enough to do already.) The two processors communicate over USB or PCIe, your choice. 

As an example, Deep Vision’s chip can be partnered with an i.MX 8M Nano host, a $10 part from NXP with multiple ARM Cortex-A53 cores and a gaggle of peripherals. A camera might feed data to the processor for preprocessing and then offloads the heavy lifting to a $15–$25 ARA-1. Once the offload is done, the NXP device can go back about its business. Deep Vision touts the fact that its processor requires less babysitting than other edge-AI parts, leaving more host CPU cycles free for other tasks. 

On the software side, Deep Vision’s toolchain accepts ML models in all the usual formats: ONNX, PyTorch, TensorFlow, Caffe2, and MXNet. The compiler’s output can go straight onto the ARA-1 chip or to the company’s bit-accurate simulator, profiler, and power optimizer. 

ARA-1’s internal microarchitecture is completely software programmable, and Deep Vision can extend its compiler to add new operators to support new models and/or satisfy customer requirements. That helps future-proof ARA-1 and its siblings as the family tree grows. 

There is definitely an ARA-2 coming, says Deep Vision’s Levy. It’ll have more on-chip memory, significant enhancements to the DLP cores, additional compute functions, and much higher on- and off-chip bandwidth, while retaining the current chip’s basic architecture and DDR, PCIe, and USB interfaces. Software will transfer from one to the other. 

Basic von Neumann processors were scary and unusual at some point. DSPs and GPUs and FPGAs were weird and unfamiliar, too. Now we’re all riding the ML accelerator wave while trying to maintain our balance. A low-cost, power-miserly chip with a friendly toolchain seems like a good place to start.

Leave a Reply

featured blogs
Jul 22, 2021
The HotFix 019 (QIR 3, indicated as 2021.1 in the application splash screens) update for OrCAD® and Allegro® is now available at Cadence Downloads . This blog post contains important links... [[ Click on the title to access the full blog on the Cadence Community si...
Jul 21, 2021
It's a funny old thing to find yourself in possession of a USB-C dock when you don't have a host machine that sports a USB-C connector with which to drive it....
Jul 21, 2021
We explain how virtual prototyping eliminates ASIC design bugs before RTL, and how chip architecture design modeling correlates key performance attributes. The post Take the Guesswork Out of Designing Your New Product Architecture appeared first on From Silicon To Software....
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

Design Success with Foundation IP & Fusion Compiler

Sponsored by Synopsys

When is 1+1 greater than 2? When using DesignWare Foundation IP & Fusion Compiler! Join Raymond and Yung in their discussion of a customer that benefited from the combination of Fusion Compiler’s machine learning and Foundation IP cells and macros.

More information about DesignWare Foundation IP: Embedded Memories, Logic Libraries, GPIO & PVT Sensors

featured paper

Configure the charge and discharge current separately in a reversible buck/boost regulator

Sponsored by Maxim Integrated

The design of a front-end converter can be made less complicated when minimal extra current overhead is required for charging the supercapacitor. This application note explains how to configure the reversible buck/boost converter to achieve a lighter impact on the system during the charging phase. Setting the charge current requirement to the minimum amount keeps the discharge current availability intact.

Click to read more

Featured Chalk Talk

Transforming 400V Power for SELV Systems

Sponsored by Mouser Electronics and Vicor

Converting from distribution-friendly voltages like 400V down to locally-useful voltages can be a tough engineering challenge. In SELV systems, many teams turn to BCM converter modules because of their efficiency, form factor, and ease of design-in. In this episode of Chalk Talk, Amelia Dalton chats with Ian Masza of Vicor about transforming 400V into power for SELV systems.

Click here for more information about Products by Vicor