feature article
Subscribe Now

Edge AI On The Cheap and Deep

Startup Deep Vision Emphasizes Programmability, Efficiency, Cost

There’s an old salesman’s adage that “confused customers never buy.” That’s why glossy sales brochures don’t have a lot of technical information, and why car salesmen don’t delve too deeply into features and benefits. Too much information can lead to analysis paralysis, and, while that might be fun for engineers, it’s bad for business. 

There’s a separate but related effect in engineering. A new technology might be interesting and impressive, but if you don’t immediately grasp how to use it, it won’t catch on. Sometimes the biggest hurdle to adoption is the learning curve. 

We saw this with the rise of DSPs (digital signal processors) in the 1990s: They were perfect for a range of applications, but few developers knew how to program one or where it was supposed to fit in their hardware block diagram. Consequently, DSP uptake was slow. The exception was at Texas Instruments, which spent a lot of corporate resources on software tools, training courses, and front-line technical support. DSP newcomers gravitated to TI, and the company converted a lot of early adopters into big customers. 

A similar strategy is playing out with AI and machine learning. We’re told that it’s the Next Big Thing, but few of us understand what it is, how it works, or where it fits in the block diagram. And confused engineers don’t design-in. 

One little company that hopes to knock down the Wall of Confusion is Deep Vision, a California startup that makes — and sells! — low-cost chips for AI-at-the-edge applications. At first blush, it’s a fabless chip company, but their real expertise is in the software tools. Deep Vision emphasizes accessibility and ease of use as much as ML performance and power efficiency. 

“Most customers don’t care what’s inside the chip,” says VP of Business Development Markus Levy. “It’s more about what you can do with it.” Indeed, most customers wouldn’t understand what’s inside since it’s probably their first AI/ML-related project. For the record, Deep Vision describes its ARA-1 chip as having a “polymorphic dataflow architecture” with a “neural-ISA core.” So now you know. 

Apart from its attitude toward tools, the company also takes a different approach to multitasking and context switching. Edge applications, they say, often continually switch between distinct ML models. For every video frame, a face-recognition app may start out searching for faces in a complex scene, then switch to identifying facial landmarks (eyes, noses, etc.), then switch again to determine an individual’s gaze, mood, drowsiness, or other characteristics. These all require different models, and switching models takes a lot of time on typical AI accelerators. 

Big datacenter applications don’t have this problem because they dedicate entire CPUs or GPUs to each model (or instances in cloud parlance). But an edge device doesn’t have that luxury. It must switch from one model to the other, while also keeping an eye on power, memory requirements, and cost. Deep Vision’s ARA-1 chip is designed to excel in those applications with its zero-overhead task switching capability.

The chip has eight identical 8/16-bit integer cores, called DLPs (deep learning processors). Each core has its own L1 cache, and they share a large 4MB L2. There’s also a control processor and a task manager, which manage the host communication and resource allocation. A 32-bit LPDDR4 interface handles external DRAM for models too large to fit in the on-chip memory. PCIe and USB interfaces provide interfaces to the host processor for passing commands and input data for the models (i.e., video frames). 

Like most AI/ML processors, ARA-1 is designed to be used in tandem with a conventional host processor (think ARM, x86, or RISC-V) running Linux. It’s a coprocessor or accelerator, designed to offload complex ML tasks from a CPU that’s not really designed for such workloads. (And that probably has enough to do already.) The two processors communicate over USB or PCIe, your choice. 

As an example, Deep Vision’s chip can be partnered with an i.MX 8M Nano host, a $10 part from NXP with multiple ARM Cortex-A53 cores and a gaggle of peripherals. A camera might feed data to the processor for preprocessing and then offloads the heavy lifting to a $15–$25 ARA-1. Once the offload is done, the NXP device can go back about its business. Deep Vision touts the fact that its processor requires less babysitting than other edge-AI parts, leaving more host CPU cycles free for other tasks. 

On the software side, Deep Vision’s toolchain accepts ML models in all the usual formats: ONNX, PyTorch, TensorFlow, Caffe2, and MXNet. The compiler’s output can go straight onto the ARA-1 chip or to the company’s bit-accurate simulator, profiler, and power optimizer. 

ARA-1’s internal microarchitecture is completely software programmable, and Deep Vision can extend its compiler to add new operators to support new models and/or satisfy customer requirements. That helps future-proof ARA-1 and its siblings as the family tree grows. 

There is definitely an ARA-2 coming, says Deep Vision’s Levy. It’ll have more on-chip memory, significant enhancements to the DLP cores, additional compute functions, and much higher on- and off-chip bandwidth, while retaining the current chip’s basic architecture and DDR, PCIe, and USB interfaces. Software will transfer from one to the other. 

Basic von Neumann processors were scary and unusual at some point. DSPs and GPUs and FPGAs were weird and unfamiliar, too. Now we’re all riding the ML accelerator wave while trying to maintain our balance. A low-cost, power-miserly chip with a friendly toolchain seems like a good place to start.

Leave a Reply

featured blogs
Oct 22, 2021
Voltus TM IC Power Integrity Solution is a power integrity and analysis signoff solution that is integrated with the full suite of design implementation and signoff tools of Cadence to deliver the... [[ Click on the title to access the full blog on the Cadence Community site...
Oct 21, 2021
We share AI chip design insights from AI Hardware Summit 2021, including wafer scale AI accelerator chips, high-bandwidth memory interfaces, and custom SoCs. The post 4 Futuristic Design Takeaways from the AI Hardware Summit 2021 appeared first on From Silicon To Software....
Oct 20, 2021
I've seen a lot of things in my time, but I don't think I was ready to see a robot that can walk, fly, ride a skateboard, and balance on a slackline....
Oct 4, 2021
The latest version of Intel® Quartus® Prime software version 21.3 has been released. It introduces many new intuitive features and improvements that make it easier to design with Intel® FPGAs, including the new Intel® Agilex'„¢ FPGAs. These new features and improvements...

featured video

Maxim Integrated is now part of Analog Devices

Sponsored by Maxim Integrated (now part of Analog Devices)

What if the march of progress suddenly broke into a full-in sprint?

See What If: analog.com/Maxim

featured paper

System-Level Benefits of the Versal Platform

Sponsored by Xilinx

This white paper provides both a qualitative and quantitative analysis of Versal ACAP system-level capabilities for a host of markets ranging from cloud to wired networking and 5G wireless infrastructure. Learn how the Versal architecture delivers best-in-class performance/watt leadership over competing 10nm FPGA architectures in end-applications such as AI compute accelerator, 5G Massive MIMO, network accelerator, smart SSDs, and multi-terabit SmartPHY—supported with data that can be validated with public tools.

Click to read more

featured chalk talk

PCI Express: An Interconnect Perspective

Sponsored by Samtec

New advances in PCIe demand new connectors, and the challenge of maintaining signal integrity has only gotten tougher. In this episode of Chalk Talk, Amelia Dalton chats with Matthew Burns of Samtec about why PCIe isn’t just for PCs, what PCIe over fiber looks like, and how Samtec’s team of signal integrity experts can help with your next PCIe design.

Click here for more information about PCI Express® Solutions from Samtec