Phone/Body Decoupling

The Holidays can be a challenging time in the US. Particularly for people that don’t like shopping, this is not a favorite time of year. Count a big portion of the male population in that group of hapless souls that gird up their courage and wade into the miasma that is the local mall.

So here you sit, in your car in the mall parking lot, watching the steady stream of people going in empty-handed, coming out loaded with booty. That’s your competition. If they buy more than you do, you lose. And if you don’t hurry, they’re going to buy the cool stuff – and you lose some more.

Your thermos – dutifully left unopened until you reached the parking lot – contains more than a soupçon of something you hoped would straighten your spine and help to launch you out of your car, but it didn’t really have that effect. Even though you kept trying. At some point, you groggily realize you need to man up and get in there and do your duty. You grope around the side of the car, finally locating the handle that will open the door and release you to the predations of commerce.

Convincing yourself three times that, yes, you locked the car, you plod uneasily towards an entrance. Like most guys in this situation, you’ve come prepared. No “shopping as entertainment” for you: you know exactly what you’re going to get and which stores will serve up those goods. You’re just not sure where those stores are in the mall.

But you know that your phone can track this stuff, and you’ve got an app that can help to lead you to your destination. You go through the door and triumphantly bring out your phone, engage the app, and start walking forward under its confident direction.

About halfway down the corridor, you summon the presence of mind to notice that you look like a dork walking around with your phone in your outstretched hand. Your next turn is still a ways off, and maybe, probably, you can remember it, so you put your phone in your back pocket for future reference.

When you get to where you think you should turn, you decide to pull your phone back out and double-check. After all, the motion sensors in your phone have been tracking your motion, so it should show you whether you’re in the right spot.

Except… it doesn’t have you anywhere near the right spot. It has you back by the mall entrance. That’s not what you needed to see in this frame of mind. You have willingly placed yourself in the belly of the beast that is the mall with nothing but liquid inspiration and a phone app to guide you. And the phone has clearly failed you. The liquid inspiration can’t be far behind… Completing purchases is no longer the goal: mere survival is. How are you going to get out of here now that your only frame of reference has abandoned you? Your vision starts to narrow and a white noise starts to close in and…

… you wake up in a cold sweat knowing that you’ve still got issues with your pedestrian navigation algorithms.

What went wrong in this most trivially simple of examples? I postulate here a scenario where we use the phone as a proxy for ourselves – where we go, our phones go, so we and they are one and the same.

Except we’re not. When our beleaguered shopper put the phone into his back pocket, he did so in a way that had the screen facing his… natural seating cushion. In other words, the phone was facing backwards. The guy had walked halfway down the hall with the phone in front, then put the phone in his pocket facing the other way – which the phone, sporting what was clearly a crude algorithm, took to mean, “Ah, I’ve turned around.” And continuing forward for the remaining half of that hallway would then put you right about near the entrance where you started – in the view of the phone, which thinks you’re retracing your steps after turning around.

Which illustrates that we are not our phones. And yet we still want to use the phones as a proxy. The motion your phone experiences is a combination of (a) any motion that your body makes, carrying the phone with you, and (b) changes in phone position and orientation relative to your body. Heck, you can even have one cancel the other: stand with your phone in your outstretched arm and step forward one foot – while pulling your arm in by that same one foot. In theory, the phone hasn’t moved, and yet that masks a lot of information – the fact that your personal position changed and that the phone’s position relative to you has changed.

This is a problem I call “phone/body decoupling” – taking the overall motion of a smartphone and inferring from it what your personal motion is and what the independent phone motion is. It’s but a small portion of that overall hot topic “context awareness.” Unlike basic motion, which is “simply” a math problem, this, like other contextual problems, lends itself to many approaches.

Now, you might, as I did, casually start to muse on how you could approach some rules to help pry apart phone and body motion. There are some simple ideas – like, when the phone turns around, look at the center of rotation to decide if it’s the body or the phone that did the turning. But the problem is, people are messy, and they don’t like to walk around like robots (even if you’re ok with looking like a dork, your arm will get tired), and there are lots of simplifying assumptions that humans routinely refuse to comply with. Someone’s always got to be an outlier. And you can’t just write off the outliers.

So I wanted to survey a few folks that have been thrashing the pedestrian navigation beast to see how they do it. Obviously, there’s only so much they’re going to say, given that this is a hot competition, but it’s interesting to compare approaches.

There are really two questions here: how does your phone deal with convolved phone/body motion, and how are those algorithms derived?

Let’s start with the first one first. I talked with four companies, three of which are familiar here – Movea, Hillcrest Labs, and Sensor Platforms – and one that we haven’t covered as often: Trusted Positioning. And I heard differences and similarities from all of them.

Let’s start with one similarity: this is not trivial stuff. All of the companies are trying to gather all the clues and cues possible to discriminate different sources of motion. And no one is trying to rely solely on the phone. For instance, Trusted Positioning will be doing a presentation at CES on having a smart watch and phone communicate with each other to improve results. It gives you two points of view that not only help to ward off drift errors (Trusted Positioning’s Chris Goodall says that each added sensor improves drift by 1.5-1.8X), but that also help to distinguish confusing context signals. These extra sensors help both in action in the field and, as we’ll see, when deriving the algorithms in the lab.

Collecting constraints

One word comes up over and over again in this discussion: constraints. The way you tease two sets of motion data (yours and your phone’s relative to you) from one measured set (raw phone) is to constrain the solution space.

One obvious such constraint can be summed up as “body mechanics.” There are only so many ways a human body can move; there are only so many ways a phone can be held. In fact, you might assume that a phone would be held only in someone’s hand – which would leave out those rare individuals who, lacking hands, have acquired dexterity with their feet. So body mechanics are important, but you have to be careful with over-simplistic assumptions.

Both Movea and Trusted Positioning turn the mechanics into a simple “map.” Trusted Positioning has three basic position classes: up at your ear (on a call), in your pocket, and one he called “general.” Movea has four such buckets: ear, pocket, swinging in your hand, and held out in front of you. They also initialize with the assumption that the phone is starting in front.

Obviously, these are gross simplifications of much more complex motion, but for some problems, they can suffice. Once they’ve decided on one of these buckets, then they may be able to apply filters and algorithms that are specific to the chosen bucket and that might not have been useful were the phone position in a different bucket.

Which leads to another observation: in a sense, we’re looking at the Mother of All Decision Trees here. Actually, it’s really the Mother of All State Machines, since context in general implies state, but each transition from state to state can involve a massive set of decisions.

So the key to efficiency here is in pruning this state machine so that the problem is tractable in real time on small, power-stingy devices. Which is why having coarse buckets is useful: the decision trees in that “state” can now be pruned of anything not relevant to that state, simplifying things and possibly making room for further refinement. I’ll just make up an example here: once the phone has decided it’s in a pocket, it might be able to refine that decision to determine specifically which pocket it’s in.

Hillcrest Labs, who also use body mechanics, mentioned another factor that they consider. It comes from their heritage in remote controls, where, early on, they realized they had to figure out how to reject hand tremor. And they found that “intentional” motion tends to be lower frequency than “unintentional” motion.

In our case, of course, we’re trying to discriminate between various different types of intentional motion, so it’s not as simple as all that. But Hillcrest Labs’ Chuck Gritton was willing to allow that frequency is an important consideration for them.

Phone features

So we have body mechanics and frequency as examples of constraints or parameters. Another word for them is “feature,” and, in fact, this is really where the gold is. My conversation with Sensor Platforms’ Kevin Shaw was more about how these features are identified than specifically what features they rely on. In fact, those features are the crown jewels, and anything useful that one company has that others haven’t thought of is valuable. They’re certainly not going to tell me about any features that aren’t obvious.

I can’t say specifically that the process that follows is one that Movea, Hillcrest Labs, and Trusted Positioning use, but, at least in concept, it’s hard to imagine that they’re not doing something similar to this.

The real question here (for this and for other similar context studies) is, how can you take sensor signals and identify syndromes and signatures that are specific enough to nail down, with high fidelity, a particular motion or position (or whatever) while being general and robust enough to work with a wide range of people of different sizes and ages and cultures and who knows what else might trip up an unsuspecting algorithm? No hard numbers work here – you can’t specify how long an arm is; you can only show statistically how long it’s likely to be and, given that length, how the joints typically work. “Typically” means there are no hard rules for that either.

So this becomes a massive data-gathering process. Let’s say you recruit a large cohort of data gatherers. Kevin used the number 10,000; I don’t know if that’s realistic or just a number, but the point is, you need lots of data. And, as with any experiment, you need to control the things that can be controlled and randomize the things that can’t be. Which means you need a broad sample to capture all those corner cases.

Kevin suggests there are three possible ways to approach this:

Have people go about their daily lives, doing what they do, streaming data to you for analysis. Obvious weakness: you can see all the signals, but you can’t correlate them to activities because you have no information on what people were doing when the data was taken.
Have those people come to you and do specified things like sit down, stand up, turn around, etc. Now you have specific activities and the data associated with them. The problem is that, apparently, when you do this, people don’t act natural. So they sit down for the experiment in a way that’s different from when they normally sit down.
Use an in-between approach, where you have people simply go about their business while using video or other means of observing what they’re doing so as not to make them self-conscious.

Obviously there’s still some trade secret involved in that third one in order to make it practical, but that’s the general approach.

What you get from this is a ton of data that streams to a server. And then those data are analyzed to find “clusters” – which sounds similar to what we called “factor analysis” back in my marketing days. With no human intervention, such algorithms pore through the data to find correlations.

The good thing about it being automated, other than its ability to handle lots of data, is that the algorithm doesn’t start out with any semantic biases. We might not think that air pressure has anything to do with which direction we’re facing, so we probably wouldn’t look for clues there. But the computer doesn’t care – it’s just another variable, and if it finds a signal, it will unabashedly let you know.

Of course, it’s not all data crunching. The output is an n-dimensional matrix, if you will, that’s mostly sparse, with clusters of data points. Ideally, each cluster would be relatively tight, with clear boundaries and ample separation from neighboring clusters so that there’s no confusion. But if it were that easy…

One risk you run into is that of confounded features: intersecting clusters that represent different features, but you can’t distinguish them. In the old days, the answer to that was simple: add some more sensors. That’s not doable with a phone, however: you get what you get. With added wearables, like a smart watch, it becomes tenable.

Barring another sensor, this is where actual humans can enter the process: analysts that look at the data and try to sharpen the clusters. Sometimes that means re-analyzing existing data to extract new discriminating features. But this optimization must also take into account two important practical matters: computation and power. That necessitates frugality.

Each of these clusters represents some combination of factors and perhaps even state. But each cluster can probably be defined by far fewer than all of the possible features, and the more these features can be trimmed without smearing out the clusters, the simpler the computation. Reminds me of eliminating don’t-care variables in a Karnaugh map. The decisions also have to be achievable with a battery – potentially always-on.

The interesting thing about this is that, especially thanks to this last human step, four different companies might do four similar experiments and get four different answers. Obviously there will have to be similarities, but in many cases, an algorithm won’t win based on the obvious stuff – it will be the corners that determine whose algorithm is best.

So it’s unlikely that any of these four companies would have done so poorly as to get your holiday mall excursion completely backwards. Nothing is that crude. (Is it?) And there’s more to pedestrian navigation than just motion – maps and beacons and other signals can help. But for the motion portion, the real trick is taking someone who’s not quite “normal” (I hate that word) and sending them in. Will he or she emerge truly triumphant, bearing gifts and escaping unscathed?