feature article
Subscribe Now

Edge AI On The Cheap and Deep

Startup Deep Vision Emphasizes Programmability, Efficiency, Cost

There’s an old salesman’s adage that “confused customers never buy.” That’s why glossy sales brochures don’t have a lot of technical information, and why car salesmen don’t delve too deeply into features and benefits. Too much information can lead to analysis paralysis, and, while that might be fun for engineers, it’s bad for business. 

There’s a separate but related effect in engineering. A new technology might be interesting and impressive, but if you don’t immediately grasp how to use it, it won’t catch on. Sometimes the biggest hurdle to adoption is the learning curve. 

We saw this with the rise of DSPs (digital signal processors) in the 1990s: They were perfect for a range of applications, but few developers knew how to program one or where it was supposed to fit in their hardware block diagram. Consequently, DSP uptake was slow. The exception was at Texas Instruments, which spent a lot of corporate resources on software tools, training courses, and front-line technical support. DSP newcomers gravitated to TI, and the company converted a lot of early adopters into big customers. 

A similar strategy is playing out with AI and machine learning. We’re told that it’s the Next Big Thing, but few of us understand what it is, how it works, or where it fits in the block diagram. And confused engineers don’t design-in. 

One little company that hopes to knock down the Wall of Confusion is Deep Vision, a California startup that makes — and sells! — low-cost chips for AI-at-the-edge applications. At first blush, it’s a fabless chip company, but their real expertise is in the software tools. Deep Vision emphasizes accessibility and ease of use as much as ML performance and power efficiency. 

“Most customers don’t care what’s inside the chip,” says VP of Business Development Markus Levy. “It’s more about what you can do with it.” Indeed, most customers wouldn’t understand what’s inside since it’s probably their first AI/ML-related project. For the record, Deep Vision describes its ARA-1 chip as having a “polymorphic dataflow architecture” with a “neural-ISA core.” So now you know. 

Apart from its attitude toward tools, the company also takes a different approach to multitasking and context switching. Edge applications, they say, often continually switch between distinct ML models. For every video frame, a face-recognition app may start out searching for faces in a complex scene, then switch to identifying facial landmarks (eyes, noses, etc.), then switch again to determine an individual’s gaze, mood, drowsiness, or other characteristics. These all require different models, and switching models takes a lot of time on typical AI accelerators. 

Big datacenter applications don’t have this problem because they dedicate entire CPUs or GPUs to each model (or instances in cloud parlance). But an edge device doesn’t have that luxury. It must switch from one model to the other, while also keeping an eye on power, memory requirements, and cost. Deep Vision’s ARA-1 chip is designed to excel in those applications with its zero-overhead task switching capability.

The chip has eight identical 8/16-bit integer cores, called DLPs (deep learning processors). Each core has its own L1 cache, and they share a large 4MB L2. There’s also a control processor and a task manager, which manage the host communication and resource allocation. A 32-bit LPDDR4 interface handles external DRAM for models too large to fit in the on-chip memory. PCIe and USB interfaces provide interfaces to the host processor for passing commands and input data for the models (i.e., video frames). 

Like most AI/ML processors, ARA-1 is designed to be used in tandem with a conventional host processor (think ARM, x86, or RISC-V) running Linux. It’s a coprocessor or accelerator, designed to offload complex ML tasks from a CPU that’s not really designed for such workloads. (And that probably has enough to do already.) The two processors communicate over USB or PCIe, your choice. 

As an example, Deep Vision’s chip can be partnered with an i.MX 8M Nano host, a $10 part from NXP with multiple ARM Cortex-A53 cores and a gaggle of peripherals. A camera might feed data to the processor for preprocessing and then offloads the heavy lifting to a $15–$25 ARA-1. Once the offload is done, the NXP device can go back about its business. Deep Vision touts the fact that its processor requires less babysitting than other edge-AI parts, leaving more host CPU cycles free for other tasks. 

On the software side, Deep Vision’s toolchain accepts ML models in all the usual formats: ONNX, PyTorch, TensorFlow, Caffe2, and MXNet. The compiler’s output can go straight onto the ARA-1 chip or to the company’s bit-accurate simulator, profiler, and power optimizer. 

ARA-1’s internal microarchitecture is completely software programmable, and Deep Vision can extend its compiler to add new operators to support new models and/or satisfy customer requirements. That helps future-proof ARA-1 and its siblings as the family tree grows. 

There is definitely an ARA-2 coming, says Deep Vision’s Levy. It’ll have more on-chip memory, significant enhancements to the DLP cores, additional compute functions, and much higher on- and off-chip bandwidth, while retaining the current chip’s basic architecture and DDR, PCIe, and USB interfaces. Software will transfer from one to the other. 

Basic von Neumann processors were scary and unusual at some point. DSPs and GPUs and FPGAs were weird and unfamiliar, too. Now we’re all riding the ML accelerator wave while trying to maintain our balance. A low-cost, power-miserly chip with a friendly toolchain seems like a good place to start.

Leave a Reply

featured blogs
Jan 27, 2023
Wow, it's already the last Friday in January, so time for one of my monthly update posts where I cover anything that doesn't justify its own full post or which is an update to something I wrote about earlier. Automotive Security I have written about automotive secur...
Jan 26, 2023
By Slava Zhuchenya Software migration can be a dreaded endeavor, especially for electronic design automation (EDA) tools that design companies… ...
Jan 24, 2023
We explain embedded magnetoresistive random access memory (eMRAM) and its low-power SoC design applications as a non-volatile memory alternative to SRAM & Flash. The post Why Embedded MRAMs Are the Future for Advanced-Node SoCs appeared first on From Silicon To Software...
Jan 19, 2023
Are you having problems adjusting your watch strap or swapping out your watch battery? If so, I am the bearer of glad tidings....

featured video

Synopsys 224G & 112G Ethernet PHY IP OIF Interop at ECOC 2022

Sponsored by Synopsys

This Featured Video shows four demonstrations of the Synopsys 224G and 112G Ethernet PHY IP long and medium reach performance, interoperating with third-party channels and SerDes.

Learn More

featured chalk talk

Solar Cells Optimized for Indoor Applications

Sponsored by Mouser Electronics and TDK

Solar cell technology is more popular than ever before, but we have only begun to scratch the surface when it comes to new applications for photovoltaic cell technology. In this episode of Chalk Talk, Amelia Dalton chats with Chris Burket from TDK about the basics of photovoltaic cells, what sets TDK’s a-SI film solar cells away from other solar cell technology on the market today and the cool new applications that can take advantage of this powerful technology.

Click here for more information about TDK BCS Low Illumination Solar Cells