Anyone who thinks humans primarily communicate verbally hasn’t spent much time misinterpreting emails or text messages. Much of our communication load may be carried over the audio channel, but the accompanying video carries more than a few hints as to how to react to what’s been said.
Part of that is simply stance and expression. But, depending on culture, more or less of what is being said may be expressed by gesture. For highly expressive people, you can see them at a distance and know, if not the detail of what’s being said, at least whether it’s time to prepare for fight or flight.
But there are actually two parts to a gesture. There’s the motion of a body part – most typically involving some or many parts of the upper limbs; call that the syntax. Then there’s the meaning associated with that movement – call it the semantics. That may sound like parsing things a bit finely, but, in fact, the same gesture may mean different things in different places.
An American in Mexico may experience a situation where, upon approaching a local, the local may appear to be waving him or her off, as if there were some danger. Turns out that that “hand-in-the-air, push it away from you, palm out and fingers curling” gesture means “come hither” in Mexico. Americans use the reverse gesture to invite someone to approach, bringing the hand towards them, palm inwards. Very confusing.
This partition between gesture syntax and semantics is at the heart of technology developed by a company called Movea (their Tim Kelliher, with whom I spoke, pronounces it “movie-uh”). Movea got their start in body-area networks, and they address a number of motion application areas, but they have specifically launched MoveTV to address gestures aimed at your TV. And no, that doesn’t mean what you probably hope it means.
Their focus is not so much for, shall we say, speech recognition: they see an opportunity in gaming and TV navigation. The enabling event is the fact that motion-sensing remote controls have come down to a price range that makes them acceptable for consumer applications. So this is about gestures that you express via a remote.
One of the big markets they see is in hotel in-room entertainment – sort of like Wii-by-the-hour. Or pay-per-Wii, or… ok I should just move along. This ties to the set-top box, which comes with a remote. The remote must, of course, be equipped with inertial measurement units (IMUs) so that it can tell how it’s being moved. It also needs radio to communicate the movements.
With those in place, you can do more than just play games: you can use the remote to help navigate through the set-top box screens, scrolling through programming and making selections more easily than by chunking away on remote buttons.
The low-level motion information from the remote is fed to the set-top box (or any other suitable computing platform), where decisions are made as to what the user is doing. Everything up to that decision is a matter of gesture syntax: what was the movement?
And that can be a tricky thing. So much so that this is where Movea has focused its value. They license what they call their SmartMotion Server, which is a gesture recognition engine. At a fundamental level, it contains a library of nine basic out-of-the-box gestures:
- Tick or check, according to your positioning vis a vis the Atlantic (√)
- Close (×)
They can also create custom gestures, and their GestureBuilder tool lets developers put together their own libraries of gestures.
While, to some extent, it might not seem like rocket science to figure out a gesture, getting enough of the nuance right for a game is tougher – they have to be able to discriminate gestures and yet accommodate a wide range of skill levels. So it’s not just motion tracking – they have to abstract a bit above that. It’s as if each gesture must represent a broad distribution around a “mean,” while still maintaining a clean separation from other gestures.
The remote itself can be a determinant of what’s possible. At the simplest level, and, for most of the gestures, pretty much any remote can be used: the remote will look like a mouse. The engine gets dx and dy information and works with that. Some of the gestures, however, require raw data from the remote. That may or may not be available, tends to be proprietary, and varies by remote.
Partnering with remote control companies has therefore been important. They no longer have to characterize individual remotes and sensors in order to calibrate individually for each one.
In fact, they’ve gotten to the point where, for the most part, doing new deals is a business arrangement, and less technical work is required. It’s all about the licensing, and most deals seem to be custom. Typically, a system builder will decide what remote and set-top box to use and will then ask Movea to work with them.
On standard platforms like Android, they can trigger standard events. But they also do a lot with proprietary systems. As an example, a Comcast setup uses a Motorola set-top box with a Broadcom processor; the remote comes from Universal Electronics. In this case, Mot puts together the whole package, including licensing the gesture engine from Movea, and then resells it to Comcast.
Of course, all we’ve dealt with here is the gesture syntax. What about the semantics? What do the gestures mean? And that, of course, depends on the application and is the further domain of the system developer. Movea delivers up the physical gesture; someone else interprets that in the context of whatever’s happening at the moment.
This may be a long way from interpreting communication gestures, but think about it… a thin, unnoticeable glove that you put on as you start to type your emails. And that glove has sensors that are used to interpret your gestures as you’re typing away, and it embeds gesture information into the email itself so the reader can know just a bit more about how to interpret your email.
I know, it’s hard to gesture and type at the same time, but with a little practice… ok… maybe not…
3 thoughts on “Recognizable Gestures”
Are you using multiple motion sensors together to detect more sophisticated motion than simply the six cardinal directions? Is it as hard as it sounds?
How do we avoid the “auction house effect” of random gestures being misinterpreted as commands? I’d hate to think that scratching my nose would change the channel in the middle of Wheel of Fortune.
This involves a remote with sensors; it’s not like Kinect, where it’s just watching your limbs. So… don’t scratch your nose with the remote.