feature article
Subscribe Now

Embedding Vision

Creating Devices That See

Remember sitting down at a DEC VT52 terminal?  The screen held 24 lines of text, at 80 characters each.  The font was built in.  VT52 proudly boasted support for all 95 ASCII characters including the desirable but somewhat superfluous lower case letters.  Some special graphics characters were available as well, but the terminal did not support graphics per se.  There was no mouse, no windows, and text editing was only marginally WYSIWYG – mostly using the “vi” text editor. 

Today, it’s hard to think of interacting with a computer, or even a smartphone, without a GUI and some sort of pointing device.  Even for those of us old enough to remember working on the VT52, trying to communicate with a machine exclusively via a keyboard would seem awkward and arcane at best.

Since most of our mobile and tablet devices have used touchscreen interfaces for awhile, it is interesting to watch a young person poke expectantly at the screen of a desktop or laptop computer, expecting touch response, and then being confused when the machine ignores their input.  Once you’ve grown accustomed to a level of sophistication in human-machine interface, it’s hard to go back.

As engineers, most of us know what the next steps are, and we know they’re really difficult problems.  Our machines need to be able to see and hear us and to understand what they’re seeing and hearing.  Voice interaction has been with us for awhile now, but it really hasn’t caught on in the mainstream.  The accuracy of voice recognition/understanding is still low, and the idea of a work or office environment with a sea of cubes – where everyone is talking aloud to their computers simultaneously – sounds a bit chaotic at best.  The problem here, of course, is that deriving meaning from spoken language is a much harder problem than simply recognizing words and phrases in an audio stream.  The secondary problem is that people seem to want private interactions with their devices – even in public places.  Spoken communication doesn’t facilitate that very well.

Likewise, there has been a gigantic amount of research into machine vision.  Video has been commoditized to the point that relatively inexpensive hardware is required to add video cameras, storage, and playback capability to an embedded device.  However, making a machine understand what is going on in that video stream is a significant challenge – one that researchers have been grappling with for decades.  The limiting factors for machine vision have always been computing power (massive amounts of computing power are required to do most of the machine vision algorithms out there in real-time on a video stream) and the algorithms themselves.  While it’s true that there is a vast repository of research available on machine vision algorithms, those algorithms tend to be tailored for very specific problems.  A number of sophisticated algorithms exist for facial recognition, for example, but those algorithms are different from those required for locating people in a scene, those required for understanding human gestures and movement, and so forth.  Since the algorithms are so specific to the type of information being extracted from the scene, creating fixed-hardware accelerators to solve the computing problem becomes impractical.  Programmable hardware like FPGAs and/or huge amounts of parallelism in conventional processors (such as with graphics processors) is required. 

This year, however, machine vision went mass market with the introduction of the Kinect interface for Microsoft’s Xbox 360.  In case you’ve been under a rock for the past year (which many of us working on complex engineering problems tend to be from time to time), MIcrosoft built a low-cost device that enables a video game console to get its input from watching the players move, rather than from a dedicated controller.  The system can locate people in the scene, interpret their gestures, and even recognize which individuals it is “seeing.”  With the retail cost of the system being less than $150 USD, one can imagine that the bill of materials cost must be very low.  Granted, Kinect cheats a bit by borrowing some of the Xbox 360’s massively parallel processing power to accomplish its magic, but even with that, Kinect sets a new bar for cost-effective machine vision.

Kinect has kicked off a virtual revolution in hacking, which apparently has been warmly welcomed by Microsoft.  There are websites and forums dedicated to sharing information on using and adapting the Kinect hardware for a huge variety of applications.  With the broad-based adoption of Kinect, the door to machine vision has been blown open, and the next few years should see remarkable progress in adding a sense of sight to our intelligent devices.

Unfortunately, adding machine vision to your next embedded design isn’t as simple as dropping in WiFi or USB.  You can’t just add a camera and a piece of machine vision IP to your embedded device and end up with a functional machine vision interface.  Vision, as we mentioned, is an incredibly complex problem that has already experienced decades of research, and the average – or even the far-above-average electronic designer – isn’t going to just pick it up with some spare weekend reading.  To get our intelligent devices to see and understand the world around them, we’re going to need some serious help. 

Fortunately, a new group has been formed with the intent of doing just that.  The Embedded Vision Alliance was founded with the goal of “Inspiring and empowering engineers to design systems that see and understand.”  Jeff Bier, President of Berkeley Design Technology (BDTi) and founder of the Embedded Vision Alliance, sees huge market potential for embedded vision applications in the near future in areas like consumer electronics, automotive, gaming, retail, medical, industrial, defense, and many others.  Embedded vision systems will be doing things like gesture-based control of devices, active driver safety and situational awareness, active digital signage, and point-of-sale transaction assistance – just to name a few.  

“The engineer who wants to add vision to his or her embedded design will have both great news and bad news,” explains Bier.  “First, they will discover that there are hundreds of papers, books, and other resources with volumes of research on the topic.  Then, they will discover that the vast majority of that work is not particularly useful for real-world engineering applications.  Much of the material is heavily theoretical – books with 800 pages filled with multi-variable calculus – and very little of it is in a form that engineers could use, like block diagrams and code.”  One of the goals of the Embedded Vision Alliance is to sift through that mountain of information and extract that which will be practically useful for adding vision to embedded designs.

The Embedded Vision Alliance already has over a dozen companies participating – from semiconductor suppliers to distributors to software companies – all of whom see a big future for embedded vision and who have products or technology that they feel will play a significant role in deployment of that capability.  The Alliance is already building a website (www.embedded-vision.com) with resources and community to assist engineers in development of embedded vision capabilities.  With efforts like this, the path to embedded vision will be far less treacherous.

Embedded vision is one of the most significant and exciting engineering challenges to come along in decades, and it will happen.  There will be a time when interacting with a machine that can’t see you will seem as strange as trying to compute with a VT52 would today.  Once our intelligent devices gain a proper set of senses, a vast range of new applications and capabilities will emerge.  If we want to be part of that revolution, we’d better start catching up now.  If embedded vision were easy, everybody would already have it.

Leave a Reply

featured blogs
Nov 20, 2020
Autumn is a tough time for us Brits.  From the beginning of September when the kids go back to school until Christmas Eve, we have little to get excited about besides the nights closing in and the weather getting worse.  For our American cousins, Thanksgiving is a r...
Nov 20, 2020
[From the last episode: We looked at neuromorphic machine learning, which is intended to act more like the brain does.] Our last topic to cover on learning (ML) is about training. We talked about supervised learning, which means we'€™re training a model based on a bunch of ...
Nov 20, 2020
Are you a lab instructor sitting at home right now? Have you completed some Cadence Online Training courses for your education and earned Digital Badges for personal promotion and spicing up your CV... [[ Click on the title to access the full blog on the Cadence Community si...
Nov 19, 2020
By Ron Lowman, Strategic Marketing Manager The post Enabling the 5G Rollout: Why Efficient and Flexible IP Is Key for Semiconductor Design appeared first on From Silicon To Software....
Nov 19, 2020
How would one set about measuring the width of a human hair using a laser? Why, with Omni'€™s Hair Diffraction Calculator, of course!...

featured video

AI SoC Chats: Scaling AI Systems with Die-to-Die Interfaces

Sponsored by Synopsys

Join Synopsys Interface IP expert Manmeet Walia to understand the trends around scaling AI SoCs and systems while minimizing latency and power by using die-to-die interfaces.

Click here for more information about DesignWare IP for Amazing AI

featured paper

Overcoming PPA and Productivity Challenges of New Age ICs with Mixed Placement Innovation

Sponsored by Cadence Design Systems

With the increase in the number of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan using manual methods, directly impacting tapeout schedules and power, performance, and area (PPA). In this white paper, learn how a breakthrough technology addresses design productivity along with design quality improvements for macro-dominated designs. Download white paper.

Click here to download the whitepaper

Featured Chalk Talk

Bluetooth Overview

Sponsored by Mouser Electronics and Silicon Labs

Bluetooth has come a long way in recent years, and adding the latest Bluetooth features to your next design is easier than ever. It’s time to ditch the cables and go wireless. In this episode of Chalk Talk, Amelia Dalton chats with Mark Beecham of Silicon labs about the latest Bluetooth capabilities including lower power, higher bandwidth, mesh, and more, as well as solutions that will make adding Bluetooth to your next design a snap.

Click here for more information about Silicon Labs EFR32BG Blue Gecko Wireless SoCs