feature article
Subscribe Now

The Internet of Seeing Things

Is Embedded Vision the Ultimate Killer App?

There is no question that the Internet of Things (IoT) is exploding. There are estimates that, in the near future, trillions of devices will be connected by the IoT. This could equate to hundreds or even thousands of connected devices for every single human being on Earth, all talking to each other over the largest communications infrastructure ever imagined. 

But, what makes a good “thing?” 

In order for a “thing” to be a useful contributing member of that internet of theirs, it needs to bring something to the party. Nobody cares about a “thing” that just sits around consuming data and doesn’t have anything to contribute. Like us humans, in order for a “thing” to be interesting, it needs perception — a way to sense what’s going on around it. Only then is it qualified to be an interesting participant in the greater conversation.

It is more or less agreed that we humans have five basic senses, and it’s no surprise that those five senses are among the first with which we bestow the machines and the “things” that we create. Just about every mobile phone today can hear and understand selected English phrases. A wide range of devices can smell and sense various gasses in the environment. Haptic sensors give our machines a sense of touch. And of course, cameras endow our creations with eyes. (Apparently, giving machines a sense of taste hasn’t been very high on the priority list yet. We don’t want them developing a penchant for caviar and truffles and running up our bill on Amazon Prime now, do we?) 

We could divide each one of these senses into three levels. First is the simple ability to perceive, or gather the data. The camera, for example, simply captures pixels and nothing more. The second is the ability to interpret. The rudimentary embedded vision system can identify the nouns and verbs. “That object is a man.” “That object is a car.” “That man is walking.” “That car is moving.” The third level is to understand in context. “That car is about to hit that man and corrective action needs to be taken.” 

Of all these senses, vision is by far the most demanding to automate. First, the sheer amount of data involved is enormous. Second, the algorithms for vision are extremely complex. Third, the amount of computational power required to run those algorithms on such massive amounts of data is beyond the cutting edge of even what today’s technology can accomplish. As a result, the amount of energy needed to perform the needed computation is also prohibitive.

When we look at embedded vision in the context of the IoT, things become even more challenging. The generally accepted architecture for the IoT is a distributed computing model where increasingly heavy computation loads are passed up the chain to more capable processors, ultimately leading to the extreme heavy lifting being done by cloud servers. Embedded vision’s demands are exactly the opposite. In order to avoid enormous data pipes and unacceptable latency, intense computation needs to be performed at the leaf node, as close to the camera as possible. It’s a lot simpler to send a message up the line that says “Hey! A man is about to be hit by a car!” than it is to send a gigabyte or so of unprocessed video up to the cloud where it will wait in line for processing, analysis, and interpretation. By the time all that could occur, it may well be too late. To look at it another way, the IoT is designed around the premise that each node filters out useless data and sends just the good stuff up the chain. In embedded vision however, it’s a lot more difficult to identify which data is important and which isn’t: there’s a lot more data to deal with, and the penalty for failure is a lot higher than with other types of sensory information.

So, here we have an IoT that’s designed to favor pushing processing toward the cloud and embedded vision that demands supercomputing at the leaf node.

Unfortunately, it’s not even THAT simple. Embedded vision is a Big Data problem also, because the only way to understand what’s really going on in a scene is to have a huge database of objects and scenarios to compare against. In order for any meaningful understanding to take place, embedded vision algorithms need direct access to at least a subset of this context information.

Because of these challenges, embedded vision is just in its infancy. Numerous engineering compromises are required to enable today’s very rudimentary embedded vision applications. You and I use the same eyes whether we’re reading a book, driving a car, catching a baseball, or watching a sunset. But at present, our “things” require a specialized vision system from the camera on down for each separate task they have to perform. Probably one of the most widely known and challenging embedded vision applications today is autonomous driving. This application requires a fusion of cameras with an array of other sensors like IR, radar, and sonar, combined with specialized algorithms for doing things like recognizing traffic signs, locating humans, other cars, and other obstacles in the scene, and detecting lane markers. All of this is supported by the biggest, fastest computers that we can manage to jam into an automobile.

If we were trying to build an embedded vision system to inspect bottles on an assembly line, or one to do facial recognition in security cameras, we’d be starting almost from scratch each time. A general-purpose analog for human-like vision is clearly far, far away. 

One thing that’s interesting about embedded vision is that it challenges every single aspect of our modern technological capability at the same time. Image sensors struggle to match the dynamic range, resolution, effective frame rate, and gamut of human vision. Every image sensor chosen today is an engineering compromise — a sensor that happens to have the right attributes for the particular problem we’re solving. The connections within a vision system push the limits of even today’s fastest multi-gigabit interconnect. The algorithms for vision analytics are highly specialized and selective and exist at the very edge of modern day computer science. The processing required for embedded vision algorithms is beyond what today’s fastest conventional processors can do and typically would have compute acceleration with faster, more power-efficient hardware such as FPGAs, GPUs, and ASICs than we have today. As we mentioned in discussing IoT, embedded vision also imposes a serious system architecture challenge. And, finally, because of these other challenges, controlling factors like cost, size, power consumption, and reliability are also extremely difficult. 

In short, every part of the engineering buffalo — optical, mechanical, software, electronic, reliability, manufacturing, verification — is pushed to the limit by embedded vision. If you’re looking for an engineering field with some legs, it’ll be a long time before embedded vision runs out of interesting and challenging problems to solve, and the potential upside for humanity is enormous. Machines that can truly “see” will have a dramatic impact on just about every aspect of our lives. 

2 thoughts on “The Internet of Seeing Things”

  1. Kevin:

    Intelligent Vision or “Embedded Vision” is the killer app that I had in mind when we started developing ZYNQ at Xilinx in 2008 ! In 2011, Jeff Bier and I started the Embedded Vision Alliance to help educate and teach developers how to add Intelligent Vision to their products. Four years later and over 60 companies strong, EVA and it’s annual Summit is THE best place to learn about new applications and advances in the foundational technology (sensors, silicon, software, services) . As the Executive Director of the Alliance and co-founder of a Vision startup, Auviz Systems, I ‘see’ a bright future for Embedded Vision and am glad to see you talking about it.

    Vin Ratford

  2. I 2nd Vin’s sentiments! Coincidentally, we’re working on a Zynq project as well (the hardware side – snickerdoodle.io) and we see the unique combo of ARM+FPGAs providing a ridiculously compelling path forward in the ‘connected computer vision’ arena.

Leave a Reply

featured blogs
Dec 4, 2020
As consumers, wireless technology is often taken for granted. How difficult would everyday life be without it? Can I open my garage door today? How do I turn on my Smart TV? Where are my social networks? Most of our daily wireless connections – from Wi-Fi and Bluetooth ...
Dec 4, 2020
I hear Percepio will be introducing the latest version of their Tracealyzer and their new DevAlert IoT device monitoring and remote diagnostics solution....
Dec 4, 2020
[From the last episode: We looked at an IoT example involving fleets of semi-trailers.] We'€™re now going to look at energy and how electronics fit into the overall global energy story. Whether it'€™s about saving money on electricity at home, making data centers more eff...
Dec 4, 2020
A few weeks ago, there was a webinar about designing 3D-ICs with Innovus Implementation. Although it was not the topic of the webinar, I should point out that if your die is more custom/analog, then... [[ Click on the title to access the full blog on the Cadence Community si...

featured video

Improve SoC-Level Verification Efficiency by Up to 10X

Sponsored by Cadence Design Systems

Chip-level testbench creation, multi-IP and CPU traffic generation, performance bottleneck identification, and data and cache-coherency verification all lack automation. The effort required to complete these tasks is error prone and time consuming. Discover how the Cadence® System VIP tool suite works seamlessly with its simulation, emulation, and prototyping engines to automate chip-level verification and improve efficiency by ten times over existing manual processes.

Click here for more information about System VIP

featured paper

Reducing Radiated EMI

Sponsored by Maxim Integrated

This application note explains how to reduce the radiated EMI emission in the MAX38643 nanopower buck converter. It also explains the sources of EMI noise, and provides a few simple methods to reduce the radiated EMI and make the MAX38643 buck converter compliant to the CISPR32 standard Class B limit.

Click here to download the whitepaper

featured chalk talk

UWB: Because Location Matters

Sponsored by Mouser Electronics and Qorvo

While technologies like GPS, WiFi, and Bluetooth all offer various types of location services, none of them are well-suited to providing accurate, indoor/outdoor, low-power, real-time, 3D location data for edge and endpoint devices. In this episode of Chalk Talk, Amelia Dalton chats with Mickael Viot from Qorvo about ultra-wideband (UWB) technology, and how it can revolutionize a wide range of applications.

Click here for more information about Qorvo Ultra-Wideband (UWB) Technology