feature article
Subscribe Now

Embedding Vision

Creating Devices That See

Remember sitting down at a DEC VT52 terminal?  The screen held 24 lines of text, at 80 characters each.  The font was built in.  VT52 proudly boasted support for all 95 ASCII characters including the desirable but somewhat superfluous lower case letters.  Some special graphics characters were available as well, but the terminal did not support graphics per se.  There was no mouse, no windows, and text editing was only marginally WYSIWYG – mostly using the “vi” text editor. 

Today, it’s hard to think of interacting with a computer, or even a smartphone, without a GUI and some sort of pointing device.  Even for those of us old enough to remember working on the VT52, trying to communicate with a machine exclusively via a keyboard would seem awkward and arcane at best.

Since most of our mobile and tablet devices have used touchscreen interfaces for awhile, it is interesting to watch a young person poke expectantly at the screen of a desktop or laptop computer, expecting touch response, and then being confused when the machine ignores their input.  Once you’ve grown accustomed to a level of sophistication in human-machine interface, it’s hard to go back.

As engineers, most of us know what the next steps are, and we know they’re really difficult problems.  Our machines need to be able to see and hear us and to understand what they’re seeing and hearing.  Voice interaction has been with us for awhile now, but it really hasn’t caught on in the mainstream.  The accuracy of voice recognition/understanding is still low, and the idea of a work or office environment with a sea of cubes – where everyone is talking aloud to their computers simultaneously – sounds a bit chaotic at best.  The problem here, of course, is that deriving meaning from spoken language is a much harder problem than simply recognizing words and phrases in an audio stream.  The secondary problem is that people seem to want private interactions with their devices – even in public places.  Spoken communication doesn’t facilitate that very well.

Likewise, there has been a gigantic amount of research into machine vision.  Video has been commoditized to the point that relatively inexpensive hardware is required to add video cameras, storage, and playback capability to an embedded device.  However, making a machine understand what is going on in that video stream is a significant challenge – one that researchers have been grappling with for decades.  The limiting factors for machine vision have always been computing power (massive amounts of computing power are required to do most of the machine vision algorithms out there in real-time on a video stream) and the algorithms themselves.  While it’s true that there is a vast repository of research available on machine vision algorithms, those algorithms tend to be tailored for very specific problems.  A number of sophisticated algorithms exist for facial recognition, for example, but those algorithms are different from those required for locating people in a scene, those required for understanding human gestures and movement, and so forth.  Since the algorithms are so specific to the type of information being extracted from the scene, creating fixed-hardware accelerators to solve the computing problem becomes impractical.  Programmable hardware like FPGAs and/or huge amounts of parallelism in conventional processors (such as with graphics processors) is required. 

This year, however, machine vision went mass market with the introduction of the Kinect interface for Microsoft’s Xbox 360.  In case you’ve been under a rock for the past year (which many of us working on complex engineering problems tend to be from time to time), MIcrosoft built a low-cost device that enables a video game console to get its input from watching the players move, rather than from a dedicated controller.  The system can locate people in the scene, interpret their gestures, and even recognize which individuals it is “seeing.”  With the retail cost of the system being less than $150 USD, one can imagine that the bill of materials cost must be very low.  Granted, Kinect cheats a bit by borrowing some of the Xbox 360’s massively parallel processing power to accomplish its magic, but even with that, Kinect sets a new bar for cost-effective machine vision.

Kinect has kicked off a virtual revolution in hacking, which apparently has been warmly welcomed by Microsoft.  There are websites and forums dedicated to sharing information on using and adapting the Kinect hardware for a huge variety of applications.  With the broad-based adoption of Kinect, the door to machine vision has been blown open, and the next few years should see remarkable progress in adding a sense of sight to our intelligent devices.

Unfortunately, adding machine vision to your next embedded design isn’t as simple as dropping in WiFi or USB.  You can’t just add a camera and a piece of machine vision IP to your embedded device and end up with a functional machine vision interface.  Vision, as we mentioned, is an incredibly complex problem that has already experienced decades of research, and the average – or even the far-above-average electronic designer – isn’t going to just pick it up with some spare weekend reading.  To get our intelligent devices to see and understand the world around them, we’re going to need some serious help. 

Fortunately, a new group has been formed with the intent of doing just that.  The Embedded Vision Alliance was founded with the goal of “Inspiring and empowering engineers to design systems that see and understand.”  Jeff Bier, President of Berkeley Design Technology (BDTi) and founder of the Embedded Vision Alliance, sees huge market potential for embedded vision applications in the near future in areas like consumer electronics, automotive, gaming, retail, medical, industrial, defense, and many others.  Embedded vision systems will be doing things like gesture-based control of devices, active driver safety and situational awareness, active digital signage, and point-of-sale transaction assistance – just to name a few.  

“The engineer who wants to add vision to his or her embedded design will have both great news and bad news,” explains Bier.  “First, they will discover that there are hundreds of papers, books, and other resources with volumes of research on the topic.  Then, they will discover that the vast majority of that work is not particularly useful for real-world engineering applications.  Much of the material is heavily theoretical – books with 800 pages filled with multi-variable calculus – and very little of it is in a form that engineers could use, like block diagrams and code.”  One of the goals of the Embedded Vision Alliance is to sift through that mountain of information and extract that which will be practically useful for adding vision to embedded designs.

The Embedded Vision Alliance already has over a dozen companies participating – from semiconductor suppliers to distributors to software companies – all of whom see a big future for embedded vision and who have products or technology that they feel will play a significant role in deployment of that capability.  The Alliance is already building a website (www.embedded-vision.com) with resources and community to assist engineers in development of embedded vision capabilities.  With efforts like this, the path to embedded vision will be far less treacherous.

Embedded vision is one of the most significant and exciting engineering challenges to come along in decades, and it will happen.  There will be a time when interacting with a machine that can’t see you will seem as strange as trying to compute with a VT52 would today.  Once our intelligent devices gain a proper set of senses, a vast range of new applications and capabilities will emerge.  If we want to be part of that revolution, we’d better start catching up now.  If embedded vision were easy, everybody would already have it.

Leave a Reply

featured blogs
Mar 28, 2024
'Move fast and break things,' a motto coined by Mark Zuckerberg, captures the ethos of Silicon Valley where creative disruption remakes the world through the invention of new technologies. From social media to autonomous cars, to generative AI, the disruptions have reverberat...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

GaN Solutions Featuring EcoGaN™ and Nano Pulse Control
In this episode of Chalk Talk, Amelia Dalton and Kengo Ohmori from ROHM Semiconductor examine the details and benefits of ROHM Semiconductor’s new lineup of EcoGaN™ Power Stage ICs that can reduce the component count by 99% and the power loss of your next design by 55%. They also investigate ROHM’s Ultra-High-Speed Control IC Technology called Nano Pulse Control that maximizes the performance of GaN devices.
Oct 9, 2023
22,411 views