feature article
Subscribe Now

Signals and Swats

The Promise and Limitations of Gesture and Motion Technology

You can almost imagine an I Love Lucy caper. Lucy and Ricky are trying to catch someone in the act of something nefarious. They dress up in fake private-eye clothes with a PI hat, turned up collar (pre-bro), and a fake moustache for her. They’re on opposite sides of the room in stealth mode, with only hand gestures to communicate. They’ve worked out an intricate set of signals, including “right hand to the nose means we go in 3…2…1…” and “left hand to the nose means something’s not right; hold off.”

And as they stand there poised for action, a fly lands on Lucy’s nose. And she swats it with her right hand, and Ricky gets ready to launch. But then she swipes it with her left hand and he panics and backs up, unsure of what to do. She, of course, is completely oblivious to her mixed messages.

I have no idea how this scene ends because it wasn’t a real episode and everyone knows it’s easy to think of situations that are funny but it’s hard to figure out how (or when) they end. So I’ll leave that for the pros.

But it introduces us to the tenuous world of gesture and motion, day two of the conference on touch, gesture, and motion put on by IMS Research. Unlike touch technology, which has so much to do with the technology needed to sense touches, gestures and motion aren’t that complicated from a sensing standpoint: you’ve got either inertial measurement units (IMUs) that sense motion or cameras that see what’s happening. You might have 2D or 3D vision (made possible by stereoscopic vision or some other kind of depth sensor).

But most of this is not about sensing; it’s about software. It takes a lot of processing to take a visual scene and overlay meaning on top of it.  But the level of meaning depends strongly on the goal. Which brings us to the central question: what’s the difference between gesture and motion? After all, gestures are motion.

My early thinking – which was supported (selectively?) by various things I’d seen and read – was that motion had to do with things that used IMUs and that gesture had to do with things that used vision. In other words, Wii was motion and Kinect was gesture.

But the further we got into the presentations, the clearer it became – eventually being explicitly obvious – that this is completely wrong. Gestures are limited, pre-defined sets of motion that act like a single token of information. They’re discrete and limited in number, and they have specific meaning. They are oriented towards command and control, and they’re event-oriented, with a specific machine response expected after a gesture.

Motion, on the other hand, is anything that moves. It may or may not have meaning, but it’s definitely not discrete – it’s continuous. Obviously motion has to be detected in order to identify a gesture, so gesture recognition lies over the top of motion, but from an application standpoint, they’re considered separate. It’s like sound and speech: there’s an infinite range of sounds, and a microphone, amp, and speakers can faithfully render them. Identifying and interpreting those sounds that are speech, however, is much different – and harder.

Gestures are an important part of new approaches to human/machine interfaces. One company spoke to the “primacy” of touchless interfaces, which struck me as one of those over-the-top “if gestures are good for some things they must be the best for everything!” comments – where the technology drives the solution.

A different presentation  noted that voice control hasn’t really taken over because people aren’t comfortable with it, especially in public. Rather than accepting this as reality, the take-away was that “social re-engineering” was needed to get people comfortable with it – again, technology forcing a solution.

So, as with touch, we have some work to do to make sure that we maintain the ability to select the right tool for the right job rather than applying one tool to everything.

Phillips noted some other challenges for gestures, not the least of which is the fact that gestures are cultural – they’re not universally intuitive. In addition, if you want to control something complex entirely using gestures, then you’ll likely have a very large gesture vocabulary to memorize – which is not likely to appeal to the masses. There are also issues with ambiguity: when you gesture “turn on,” does that mean the light or the TV?

It actually occurs to me as I write this that many gestural issues could be solved if systems could recognize – and if everyone learned – sign language. Of course… there are different sign languages… lots of them… but still…

Motion raises a separate set of questions, especially when it comes to realistic applications. The most obvious motion applications we have now are activity-related. Simulated golf or football or whatever – great fun for the family in the comfort of your own living room – and up off the couch.

But videos glorifying the future of motion also show some alluring – and entirely improbable (imho) – scenes. For example, using motion to control, say, your phone. This isn’t a gesture app; it’s a camera watching your fingers dance through the air as if you’re dialing on a macro-phone. Such scenes typically depict a standard phone interface blithely following all the hand motions in mid-air. Really?

It’s hard enough to get a touch-screen to interpret the right location of my fat finger; in the air, if I’m selecting an app from a 4×8 matrix of little icons, I’m simply going to point in the air and hit the right one? I don’t think so. If the app provides visual feedback by, say, tracking where your finger is on the screen, then maybe (those details are never part of such videos… perhaps I need to relax and let go and assume they’ll figure that out with the first prototypes).

Even more ludicrous, if I may wax Ludditic, are scenes of people playing air-violin or air piano. Must have been done by people who think that being a master at air guitar makes you able to really and truly play guitar. Or people who think that Autotune turns them into great singers.

Ask any musician whether they get their sounds simply by putting their finger or hand at the right place at the right time and they’ll tell you that that’s only the start. Pressure matters. Bending notes matters. Attack and decay matter. And those are incredibly subtle – not macro motions that are easily discernible and not even visible when executed in the mouth on a wind instrument.

Anyway… before this turns into a full-blown rant over something that, at its core, is a promotional video rather than actual technology… (Perhaps someone will take this as a challenge to create a real air-motion-only musical instrument with all the subtlety and nuance of a real instrument…) Moving along…

There is an organization that has been started to assemble vision technology information in one place; it’s called the Embedded Vision Alliance. Started and run by the folks at BDTi, the website seems to have quite a bit of information on the industry, applications, and technology. This includes, of course, both gesture and motion.

In general, developments in gesture and motion are proceeding briskly, and one of the main challenges will be figuring out where they work best and where other modalities work better. And it’s a subtle world, and discriminating subtlety from noise will be a challenge. Frankly, combining some of this with what seem to be some dramatic advances in reading brains might help to establish intent and thereby filter noise. (On the other hand, if brain reading gets that good, we won’t need to gesture at all.)

Whatever way we solve it, Lucy and Ricky would definitely benefit from a technology that helps them to decide whether or not a particular gesture really means, “Let’s roll.”

 

More info:

The Embedded Vision Alliance

2 thoughts on “Signals and Swats”

  1. Air-motion-only instrument = theremin?

    But point taken. Why would people want to wave their hands about to control their phone? And where is the phone when all this hand waving is going on? Controlling the TV by shouting at it – now that’s my goal! 🙂

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

High Voltage Stackable Dual Phase Constant On Time Controllers - Microchip and Mouser
Sponsored by Mouser Electronics and Microchip
In this episode of Chalk Talk, Chris Romano from Microchip and Amelia Dalton discuss the what, where, and how of Microchip’s high voltage stackable dual phase constant on time controllers. They investigate the stacking capabilities of the MIC2132 controller, how these controllers compare with other solutions on the market, and how you can take advantage of these solutions in your next design.
May 22, 2023
38,518 views