feature article
Subscribe Now

The Internet of Seeing Things

Is Embedded Vision the Ultimate Killer App?

There is no question that the Internet of Things (IoT) is exploding. There are estimates that, in the near future, trillions of devices will be connected by the IoT. This could equate to hundreds or even thousands of connected devices for every single human being on Earth, all talking to each other over the largest communications infrastructure ever imagined. 

But, what makes a good “thing?” 

In order for a “thing” to be a useful contributing member of that internet of theirs, it needs to bring something to the party. Nobody cares about a “thing” that just sits around consuming data and doesn’t have anything to contribute. Like us humans, in order for a “thing” to be interesting, it needs perception — a way to sense what’s going on around it. Only then is it qualified to be an interesting participant in the greater conversation.

It is more or less agreed that we humans have five basic senses, and it’s no surprise that those five senses are among the first with which we bestow the machines and the “things” that we create. Just about every mobile phone today can hear and understand selected English phrases. A wide range of devices can smell and sense various gasses in the environment. Haptic sensors give our machines a sense of touch. And of course, cameras endow our creations with eyes. (Apparently, giving machines a sense of taste hasn’t been very high on the priority list yet. We don’t want them developing a penchant for caviar and truffles and running up our bill on Amazon Prime now, do we?) 

We could divide each one of these senses into three levels. First is the simple ability to perceive, or gather the data. The camera, for example, simply captures pixels and nothing more. The second is the ability to interpret. The rudimentary embedded vision system can identify the nouns and verbs. “That object is a man.” “That object is a car.” “That man is walking.” “That car is moving.” The third level is to understand in context. “That car is about to hit that man and corrective action needs to be taken.” 

Of all these senses, vision is by far the most demanding to automate. First, the sheer amount of data involved is enormous. Second, the algorithms for vision are extremely complex. Third, the amount of computational power required to run those algorithms on such massive amounts of data is beyond the cutting edge of even what today’s technology can accomplish. As a result, the amount of energy needed to perform the needed computation is also prohibitive.

When we look at embedded vision in the context of the IoT, things become even more challenging. The generally accepted architecture for the IoT is a distributed computing model where increasingly heavy computation loads are passed up the chain to more capable processors, ultimately leading to the extreme heavy lifting being done by cloud servers. Embedded vision’s demands are exactly the opposite. In order to avoid enormous data pipes and unacceptable latency, intense computation needs to be performed at the leaf node, as close to the camera as possible. It’s a lot simpler to send a message up the line that says “Hey! A man is about to be hit by a car!” than it is to send a gigabyte or so of unprocessed video up to the cloud where it will wait in line for processing, analysis, and interpretation. By the time all that could occur, it may well be too late. To look at it another way, the IoT is designed around the premise that each node filters out useless data and sends just the good stuff up the chain. In embedded vision however, it’s a lot more difficult to identify which data is important and which isn’t: there’s a lot more data to deal with, and the penalty for failure is a lot higher than with other types of sensory information.

So, here we have an IoT that’s designed to favor pushing processing toward the cloud and embedded vision that demands supercomputing at the leaf node.

Unfortunately, it’s not even THAT simple. Embedded vision is a Big Data problem also, because the only way to understand what’s really going on in a scene is to have a huge database of objects and scenarios to compare against. In order for any meaningful understanding to take place, embedded vision algorithms need direct access to at least a subset of this context information.

Because of these challenges, embedded vision is just in its infancy. Numerous engineering compromises are required to enable today’s very rudimentary embedded vision applications. You and I use the same eyes whether we’re reading a book, driving a car, catching a baseball, or watching a sunset. But at present, our “things” require a specialized vision system from the camera on down for each separate task they have to perform. Probably one of the most widely known and challenging embedded vision applications today is autonomous driving. This application requires a fusion of cameras with an array of other sensors like IR, radar, and sonar, combined with specialized algorithms for doing things like recognizing traffic signs, locating humans, other cars, and other obstacles in the scene, and detecting lane markers. All of this is supported by the biggest, fastest computers that we can manage to jam into an automobile.

If we were trying to build an embedded vision system to inspect bottles on an assembly line, or one to do facial recognition in security cameras, we’d be starting almost from scratch each time. A general-purpose analog for human-like vision is clearly far, far away. 

One thing that’s interesting about embedded vision is that it challenges every single aspect of our modern technological capability at the same time. Image sensors struggle to match the dynamic range, resolution, effective frame rate, and gamut of human vision. Every image sensor chosen today is an engineering compromise — a sensor that happens to have the right attributes for the particular problem we’re solving. The connections within a vision system push the limits of even today’s fastest multi-gigabit interconnect. The algorithms for vision analytics are highly specialized and selective and exist at the very edge of modern day computer science. The processing required for embedded vision algorithms is beyond what today’s fastest conventional processors can do and typically would have compute acceleration with faster, more power-efficient hardware such as FPGAs, GPUs, and ASICs than we have today. As we mentioned in discussing IoT, embedded vision also imposes a serious system architecture challenge. And, finally, because of these other challenges, controlling factors like cost, size, power consumption, and reliability are also extremely difficult. 

In short, every part of the engineering buffalo — optical, mechanical, software, electronic, reliability, manufacturing, verification — is pushed to the limit by embedded vision. If you’re looking for an engineering field with some legs, it’ll be a long time before embedded vision runs out of interesting and challenging problems to solve, and the potential upside for humanity is enormous. Machines that can truly “see” will have a dramatic impact on just about every aspect of our lives. 

2 thoughts on “The Internet of Seeing Things”

  1. Kevin:

    Intelligent Vision or “Embedded Vision” is the killer app that I had in mind when we started developing ZYNQ at Xilinx in 2008 ! In 2011, Jeff Bier and I started the Embedded Vision Alliance to help educate and teach developers how to add Intelligent Vision to their products. Four years later and over 60 companies strong, EVA and it’s annual Summit is THE best place to learn about new applications and advances in the foundational technology (sensors, silicon, software, services) . As the Executive Director of the Alliance and co-founder of a Vision startup, Auviz Systems, I ‘see’ a bright future for Embedded Vision and am glad to see you talking about it.

    Vin Ratford

  2. I 2nd Vin’s sentiments! Coincidentally, we’re working on a Zynq project as well (the hardware side – snickerdoodle.io) and we see the unique combo of ARM+FPGAs providing a ridiculously compelling path forward in the ‘connected computer vision’ arena.

Leave a Reply

featured blogs
Apr 19, 2024
In today's rapidly evolving digital landscape, staying at the cutting edge is crucial to success. For MaxLinear, bridging the gap between firmware and hardware development has been pivotal. All of the company's products solve critical communication and high-frequency analysis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Peak Power Introduction and Solutions
Sponsored by Mouser Electronics and MEAN WELL
In this episode of Chalk Talk, Amelia Dalton and Karim Bheiry from MEAN WELL explore why motors and capacitors need peak current during startup, the parameters to keep in mind when choosing your next power supply for these kind of designs, and the specific applications where MEAN WELL’s enclosed power supplies with peak power would bring the most benefit.
Jan 22, 2024
12,582 views