feature article
Subscribe Now

Embedding Vision

Creating Devices That See

Remember sitting down at a DEC VT52 terminal?  The screen held 24 lines of text, at 80 characters each.  The font was built in.  VT52 proudly boasted support for all 95 ASCII characters including the desirable but somewhat superfluous lower case letters.  Some special graphics characters were available as well, but the terminal did not support graphics per se.  There was no mouse, no windows, and text editing was only marginally WYSIWYG – mostly using the “vi” text editor. 

Today, it’s hard to think of interacting with a computer, or even a smartphone, without a GUI and some sort of pointing device.  Even for those of us old enough to remember working on the VT52, trying to communicate with a machine exclusively via a keyboard would seem awkward and arcane at best.

Since most of our mobile and tablet devices have used touchscreen interfaces for awhile, it is interesting to watch a young person poke expectantly at the screen of a desktop or laptop computer, expecting touch response, and then being confused when the machine ignores their input.  Once you’ve grown accustomed to a level of sophistication in human-machine interface, it’s hard to go back.

As engineers, most of us know what the next steps are, and we know they’re really difficult problems.  Our machines need to be able to see and hear us and to understand what they’re seeing and hearing.  Voice interaction has been with us for awhile now, but it really hasn’t caught on in the mainstream.  The accuracy of voice recognition/understanding is still low, and the idea of a work or office environment with a sea of cubes – where everyone is talking aloud to their computers simultaneously – sounds a bit chaotic at best.  The problem here, of course, is that deriving meaning from spoken language is a much harder problem than simply recognizing words and phrases in an audio stream.  The secondary problem is that people seem to want private interactions with their devices – even in public places.  Spoken communication doesn’t facilitate that very well.

Likewise, there has been a gigantic amount of research into machine vision.  Video has been commoditized to the point that relatively inexpensive hardware is required to add video cameras, storage, and playback capability to an embedded device.  However, making a machine understand what is going on in that video stream is a significant challenge – one that researchers have been grappling with for decades.  The limiting factors for machine vision have always been computing power (massive amounts of computing power are required to do most of the machine vision algorithms out there in real-time on a video stream) and the algorithms themselves.  While it’s true that there is a vast repository of research available on machine vision algorithms, those algorithms tend to be tailored for very specific problems.  A number of sophisticated algorithms exist for facial recognition, for example, but those algorithms are different from those required for locating people in a scene, those required for understanding human gestures and movement, and so forth.  Since the algorithms are so specific to the type of information being extracted from the scene, creating fixed-hardware accelerators to solve the computing problem becomes impractical.  Programmable hardware like FPGAs and/or huge amounts of parallelism in conventional processors (such as with graphics processors) is required. 

This year, however, machine vision went mass market with the introduction of the Kinect interface for Microsoft’s Xbox 360.  In case you’ve been under a rock for the past year (which many of us working on complex engineering problems tend to be from time to time), MIcrosoft built a low-cost device that enables a video game console to get its input from watching the players move, rather than from a dedicated controller.  The system can locate people in the scene, interpret their gestures, and even recognize which individuals it is “seeing.”  With the retail cost of the system being less than $150 USD, one can imagine that the bill of materials cost must be very low.  Granted, Kinect cheats a bit by borrowing some of the Xbox 360’s massively parallel processing power to accomplish its magic, but even with that, Kinect sets a new bar for cost-effective machine vision.

Kinect has kicked off a virtual revolution in hacking, which apparently has been warmly welcomed by Microsoft.  There are websites and forums dedicated to sharing information on using and adapting the Kinect hardware for a huge variety of applications.  With the broad-based adoption of Kinect, the door to machine vision has been blown open, and the next few years should see remarkable progress in adding a sense of sight to our intelligent devices.

Unfortunately, adding machine vision to your next embedded design isn’t as simple as dropping in WiFi or USB.  You can’t just add a camera and a piece of machine vision IP to your embedded device and end up with a functional machine vision interface.  Vision, as we mentioned, is an incredibly complex problem that has already experienced decades of research, and the average – or even the far-above-average electronic designer – isn’t going to just pick it up with some spare weekend reading.  To get our intelligent devices to see and understand the world around them, we’re going to need some serious help. 

Fortunately, a new group has been formed with the intent of doing just that.  The Embedded Vision Alliance was founded with the goal of “Inspiring and empowering engineers to design systems that see and understand.”  Jeff Bier, President of Berkeley Design Technology (BDTi) and founder of the Embedded Vision Alliance, sees huge market potential for embedded vision applications in the near future in areas like consumer electronics, automotive, gaming, retail, medical, industrial, defense, and many others.  Embedded vision systems will be doing things like gesture-based control of devices, active driver safety and situational awareness, active digital signage, and point-of-sale transaction assistance – just to name a few.  

“The engineer who wants to add vision to his or her embedded design will have both great news and bad news,” explains Bier.  “First, they will discover that there are hundreds of papers, books, and other resources with volumes of research on the topic.  Then, they will discover that the vast majority of that work is not particularly useful for real-world engineering applications.  Much of the material is heavily theoretical – books with 800 pages filled with multi-variable calculus – and very little of it is in a form that engineers could use, like block diagrams and code.”  One of the goals of the Embedded Vision Alliance is to sift through that mountain of information and extract that which will be practically useful for adding vision to embedded designs.

The Embedded Vision Alliance already has over a dozen companies participating – from semiconductor suppliers to distributors to software companies – all of whom see a big future for embedded vision and who have products or technology that they feel will play a significant role in deployment of that capability.  The Alliance is already building a website (www.embedded-vision.com) with resources and community to assist engineers in development of embedded vision capabilities.  With efforts like this, the path to embedded vision will be far less treacherous.

Embedded vision is one of the most significant and exciting engineering challenges to come along in decades, and it will happen.  There will be a time when interacting with a machine that can’t see you will seem as strange as trying to compute with a VT52 would today.  Once our intelligent devices gain a proper set of senses, a vast range of new applications and capabilities will emerge.  If we want to be part of that revolution, we’d better start catching up now.  If embedded vision were easy, everybody would already have it.

Leave a Reply

featured blogs
May 21, 2022
May is Asian American and Pacific Islander (AAPI) Heritage Month. We would like to spotlight some of our incredible AAPI-identifying employees to celebrate. We recognize the important influence that... ...
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

Reduce EV cost and improve drive range by integrating powertrain systems

Sponsored by Texas Instruments

When you can create automotive applications that do more with fewer parts, you’ll reduce both weight and cost and improve reliability. That’s the idea behind integrating electric vehicle (EV) and hybrid electric vehicle (HEV) designs.

Click to read more

featured chalk talk

Clamping Down on Failure: Protecting 24 V Digital Outputs

Sponsored by Mouser Electronics and Skyworks

If you're designing IEC61131 compliant digital outputs for these PLCs or industrial controllers, you need to have a plan to protect these outputs from a variety of unknowns. In this episode of Chalk Talk, Amelia Dalton chats with Asa Kirby from Skyworks about an innovative new isolated smart switch device from Skyworks that gives you an unprecedented level of channel flexibility and protection, letting you offer customers a truly “set it and forget it” solution when it comes to your next PLC design.

Click here for more information about Skyworks Solutions Inc. Si834x Isolated Smart Switches