feature article
Subscribe Now

Edge AI On The Cheap and Deep

Startup Deep Vision Emphasizes Programmability, Efficiency, Cost

There’s an old salesman’s adage that “confused customers never buy.” That’s why glossy sales brochures don’t have a lot of technical information, and why car salesmen don’t delve too deeply into features and benefits. Too much information can lead to analysis paralysis, and, while that might be fun for engineers, it’s bad for business. 

There’s a separate but related effect in engineering. A new technology might be interesting and impressive, but if you don’t immediately grasp how to use it, it won’t catch on. Sometimes the biggest hurdle to adoption is the learning curve. 

We saw this with the rise of DSPs (digital signal processors) in the 1990s: They were perfect for a range of applications, but few developers knew how to program one or where it was supposed to fit in their hardware block diagram. Consequently, DSP uptake was slow. The exception was at Texas Instruments, which spent a lot of corporate resources on software tools, training courses, and front-line technical support. DSP newcomers gravitated to TI, and the company converted a lot of early adopters into big customers. 

A similar strategy is playing out with AI and machine learning. We’re told that it’s the Next Big Thing, but few of us understand what it is, how it works, or where it fits in the block diagram. And confused engineers don’t design-in. 

One little company that hopes to knock down the Wall of Confusion is Deep Vision, a California startup that makes — and sells! — low-cost chips for AI-at-the-edge applications. At first blush, it’s a fabless chip company, but their real expertise is in the software tools. Deep Vision emphasizes accessibility and ease of use as much as ML performance and power efficiency. 

“Most customers don’t care what’s inside the chip,” says VP of Business Development Markus Levy. “It’s more about what you can do with it.” Indeed, most customers wouldn’t understand what’s inside since it’s probably their first AI/ML-related project. For the record, Deep Vision describes its ARA-1 chip as having a “polymorphic dataflow architecture” with a “neural-ISA core.” So now you know. 

Apart from its attitude toward tools, the company also takes a different approach to multitasking and context switching. Edge applications, they say, often continually switch between distinct ML models. For every video frame, a face-recognition app may start out searching for faces in a complex scene, then switch to identifying facial landmarks (eyes, noses, etc.), then switch again to determine an individual’s gaze, mood, drowsiness, or other characteristics. These all require different models, and switching models takes a lot of time on typical AI accelerators. 

Big datacenter applications don’t have this problem because they dedicate entire CPUs or GPUs to each model (or instances in cloud parlance). But an edge device doesn’t have that luxury. It must switch from one model to the other, while also keeping an eye on power, memory requirements, and cost. Deep Vision’s ARA-1 chip is designed to excel in those applications with its zero-overhead task switching capability.

The chip has eight identical 8/16-bit integer cores, called DLPs (deep learning processors). Each core has its own L1 cache, and they share a large 4MB L2. There’s also a control processor and a task manager, which manage the host communication and resource allocation. A 32-bit LPDDR4 interface handles external DRAM for models too large to fit in the on-chip memory. PCIe and USB interfaces provide interfaces to the host processor for passing commands and input data for the models (i.e., video frames). 

Like most AI/ML processors, ARA-1 is designed to be used in tandem with a conventional host processor (think ARM, x86, or RISC-V) running Linux. It’s a coprocessor or accelerator, designed to offload complex ML tasks from a CPU that’s not really designed for such workloads. (And that probably has enough to do already.) The two processors communicate over USB or PCIe, your choice. 

As an example, Deep Vision’s chip can be partnered with an i.MX 8M Nano host, a $10 part from NXP with multiple ARM Cortex-A53 cores and a gaggle of peripherals. A camera might feed data to the processor for preprocessing and then offloads the heavy lifting to a $15–$25 ARA-1. Once the offload is done, the NXP device can go back about its business. Deep Vision touts the fact that its processor requires less babysitting than other edge-AI parts, leaving more host CPU cycles free for other tasks. 

On the software side, Deep Vision’s toolchain accepts ML models in all the usual formats: ONNX, PyTorch, TensorFlow, Caffe2, and MXNet. The compiler’s output can go straight onto the ARA-1 chip or to the company’s bit-accurate simulator, profiler, and power optimizer. 

ARA-1’s internal microarchitecture is completely software programmable, and Deep Vision can extend its compiler to add new operators to support new models and/or satisfy customer requirements. That helps future-proof ARA-1 and its siblings as the family tree grows. 

There is definitely an ARA-2 coming, says Deep Vision’s Levy. It’ll have more on-chip memory, significant enhancements to the DLP cores, additional compute functions, and much higher on- and off-chip bandwidth, while retaining the current chip’s basic architecture and DDR, PCIe, and USB interfaces. Software will transfer from one to the other. 

Basic von Neumann processors were scary and unusual at some point. DSPs and GPUs and FPGAs were weird and unfamiliar, too. Now we’re all riding the ML accelerator wave while trying to maintain our balance. A low-cost, power-miserly chip with a friendly toolchain seems like a good place to start.

Leave a Reply

featured blogs
Jun 6, 2023
On June 1, Cadence president and CEO Anirudh Devgan rang the Nasdaq Stock Market opening bell in New York City to celebrate our 35th anniversary and our many accomplishments. Here are a few thoughts from KT Moore, vice president of Corporate Marketing, on this significant mil...
Jun 2, 2023
I just heard something that really gave me pause for thought -- the fact that everyone experiences two forms of death (given a choice, I'd rather not experience even one)....
Jun 2, 2023
Explore the importance of big data analytics in the semiconductor manufacturing process, as chip designers pull insights from throughout the silicon lifecycle. The post Demanding Chip Complexity and Manufacturing Requirements Call for Data Analytics appeared first on New Hor...

featured video

Automatically Generate, Budget and Optimize UPF with Synopsys Verdi UPF Architect

Sponsored by Synopsys

Learn to translate a high-level power intent from CSV to a consumable UPF across a typical ASIC design flow using Verdi UPF Architect. Power Architect can focus on the efficiency of the Power Intent instead of worrying about Syntax & UPF Semantics.

Learn more about Synopsys’ Energy-Efficient SoCs Solutions

featured paper

EC Solver Tech Brief

Sponsored by Cadence Design Systems

The Cadence® Celsius™ EC Solver supports electronics system designers in managing the most challenging thermal/electronic cooling problems quickly and accurately. By utilizing a powerful computational engine and meshing technology, designers can model and analyze the fluid flow and heat transfer of even the most complex electronic system and ensure the electronic cooling system is reliable.

Click to read more

featured chalk talk

EdgeLock® Secure Element & Secure Authenticator
Today’s IoT designs demand comprehensive security implementation, but incorporating a robust security solution in your design can be a complicated and time-consuming process. In this episode of Chalk Talk, Amelia Dalton and Antje Schutz from NXP explore NXP’s EdgeLock Secure Element and Secure Authenticator Solution. They examine how this flexible, future-proof and easy to deploy solution can be a great fit for a variety of IoT designs.
Sep 8, 2022
32,088 views