FPGA Fone Fusion

Here’s one possible way to not get a job. Let’s say the interviewer asks you about things inside a cell phone. And you, having an FPGA background, suggest that there’s an opportunity for some great FPGA usage inside said phone.

Yeah… you’re likely to get a polite, “That’s interesting…” and a note, “FPGA guy with an FPGA hammer… and every system is his nail, even if it has a Philips head on it… Pass…”

No one puts baby in the corner, and no one puts FPGAs in phones or small mobile devices! Why? Well, duh… First, they’re ginormous. And second, they’d suck the battery dry by the time the thing booted up. I mean, the only device less likely to be considered would be a real-deal, old-school vacuum tube. Right? (Although audiophiles would probably totally dig that…)

Well, that’s the received wisdom, anyway. And yeah, you don’t see many FPGAs in there, and most FPGA makers don’t even ask to be included. There are lower-energy-expense meals to be hunted than that one.

But the interviewer’s judgment may be a bit rash: there is one company that has been sneaking FPGAs into mobile devices: Lattice. Thanks to their SiliconBlue acquisition. SiliconBlue actively pursued mobile as a market, driving their device specs to make the chips viable (rather than trying to stuff existing devices into the socket). There’s actually another recent addition, QuickLogic: although it’s not marketed explicitly as an FPGA, it’s powered internally by an FPGA structure.

Whether or not the FPGA innards are visible has a dramatic impact on how you might design with the device, but we’ll return to that topic in a moment. For now, let’s focus on the two huge barriers to most FPGAs in phones or little battery-powered devices: their power needs and their size. And, perhaps most importantly, a motivating application.

Power dominates

Power is intimately connected to the application that Lattice is actively targeting: sensor fusion in the era of “always on.” We know that a good way to save battery is to let the phone sleep. But in a world where context is becoming important, well, the phone has to sleep with one eye open so that it can monitor for significant context changes. And if that eye is always open, then it can’t demand much energy.

How much is “much”? Well, QuickLogic says that the total budget for always-on support has to be under 1% of battery capacity, placing it around 5-10 mW. So that caps us for the FPGA, understanding that there may be more than just the FPGA involved here – it can’t hog the entire budget.

Now… before we go forward, a word of caution: exactly how much power a given sensor fusion implementation might consume depends on a lot of things. Microcontrollers are commonly used, but their clock speed directly impacts power – and the amount of code to be run directly impacts clock speed. Again, a topic we’ll return to. My caution here is that simple numbers may hide real-world complexity, and we’re about to compare numbers.

Specifically, Lattice has announced a new iCE40 family member: the iCE40LM. (They also announced the iCE40LP for IR processing, covered here.) The iCE40LM is targeted as a sensor hub in phones. Sensor hubs are intended to offload low-level sensor management from the application processor (AP); that way the AP can sleep, to be woken only when the sensor hub sees something that it thinks the AP will want to know about.

Using the AP for all things sensor will consume enormous amounts of power because you can’t let the thing go to sleep. Lattice puts this number at around 100 mW – far beyond the realm of always-on acceptability. Microcontroller-based sensor hubs will operate around the 10-mW level, again using Lattice’s numbers. The iCE40LM, by contrast, is below 1 mW.

They also presented this data in terms of the cost in battery life of doing always-on context monitoring. Using the AP would cost 16.5 hours. In other words, your battery would go dead 16.5 hours earlier than it does now. For a battery that may not last 16.5 hours even without context awareness, this is clearly a non-starter. Literally. It’s dead before it starts.

They put the microcontroller cost at 2.4 hours. And the iCE40LM? 0.2 hours (around 12 minutes). Which seems a pretty modest price for what context awareness can buy you.

An additional benefit of this kind of hardware solution is reduced latency. Most sensor fusion is implemented as software, so it executes sequentially. (No, no one seems to have resorted to multicore for this application. Yet.) So, for example, when polling sensors, you get them one at a time, in round-robin fashion. Done sloppily, you might read the value of sensor 1 during time-slice a; if you wait too long to get to sensor 3, then it might be offering up data from time-slice b. (Another topic we’ll return to.)

An FPGA is all-hardware, and the great thing about hardware is that it can do lots of things in parallel. Like grabbing all sensor inputs at the same time (assuming they have independent inputs) or even grabbing them serially but processing them in parallel. Lattice claims near-zero latency for their sensor fusion.

By the way, there’s not even universal agreement on whether trying to save power here matters. I talked with Hillcrest Labs, and they suggest that the power required by the gyroscope will swamp the power of a microcontroller. I noted in that conversation that it might be possible to shut the gyro down unless it’s absolutely needed, which they conceded, but they said that there may be apps where the gyro couldn’t be shut down, which would render this whole power discussion moot. Kionix, by contrast, said more or less that any power that can be saved would be of interest.

Space critical

Even though it may be possible to save power with one of these, there is still the matter of space. Phones are notoriously crammed with stuff, and the barrier to adding yet another component is huge. So the only acceptable FPGA will be a tiny – nay, miniscule – FPGA.

Which brings up the other horn that Lattice is tooting: their very small package size. They claim that it is, to their knowledge, the smallest sensor-fusion-capable component out there. It’s a 25-ball chip-scale package (CSP) that’s 1.71 mm square and – critically – only 0.45 mm thick.

They say that the entire thickness budget, including air gap for thermal, is 1 mm. We usually don’t pay as much attention to height here as we do to area, but, for this application, all the dimensions matter. Consuming less than half of that budget is a pretty big deal.

Their I/Os include RGB/LED drivers as well as SPI and I²C interfaces. They also generate polling strobes (a fast one and a low-power slow one).

All of this relates, of course, to the space required on a board. But what about the internal space? How much logic is in here? The largest device is a 3520-logic-element (LE) device, with smaller half- and quarter-sized brethren. (Those of you conversant with FPGAs know to be cautious when it comes to estimating actual gate density equivalents…) The SPI and I²C interfaces use hardened logic, so they don’t consume LEs. The question is, then, how much sensor fusion will that accommodate?

And the answer to that is: I don’t really know. Sensor fusion has largely been done in the software domain, so there’s not a body of evidence out there for estimating how many gates are required for a hardware implementation. Hillcrest noted that it’s not even that simple a question, since the answer will depend on the quality of fusion performed.

I asked QuickLogic how big their underlying FPGA was, and they said 1000 LEs. So that would make the 3520 LEs seem like plenty. However, QuickLogic has done some rather sophisticated partitioning to make this work (yet another thing we’ll return to). Users of Lattice’s devices could, of course, do the same kind of thing, but it’s not as simple as taking code and converting it to gates.

I think that, once applications are public, Lattice will be able to (or would do well to) publish relative gate count requirements for various sensor hub functions or configurations, including performance quality results (you might have two implementations of the same function for low- and high-end usage). Even then, how those combine as you mix and match functions might or might not be easy. This is hardware design, not software design.

Other considerations

So let’s assume that 3520 LEs is enough to do some useful fusion. Here we have a very low-power, extremely small device that would seem to be pretty well suited for sensor fusion in a phone setting. Is it simply as simple as that?

Here’s where things get complicated. One of the major benefits of an FPGA is that it’s reprogrammable. Even after it’s out in the field, if you want to do an update, you can patch in new code, just like you can with software. But it’s not quite as easy as software, since the amount of software you can add is limited only by available processing headroom and your memory footprint.

In principle, updating an FPGA is easy for a user, but for the hardware designer, it won’t be obvious how the update will affect the FPGA until he or she actually tries it. Heck, simply changing variable types could suddenly blow an app out of the device.

In fact, one common way to make an application more amenable to in-field upgrades is to avoid filling it too full. Experienced FPGA designers know to keep logic utilization to around 70%, even without leaving room for an upgrade, to make sure that place and route can succeed. For the iCE40LM 4K device, that would be just over 2450 LEs. Staying even below this would make it more likely that future changes would still fit in the device.

This also impacts ECOs: software ECOs are straightforward and expected. Hardware ECOs are expected but deprecated. FPGAs make them far easier than if you havean after-silicon SoC ECO, but you still have to be careful that such last-minute changes don’t suddenly cause the design to overflow the available logic or routing. And culturally, FPGAs are designed within different silos from the software guys, so this means an ECO push from one silo to another. Definitely doable, but also easy to botch if not done mindfully.

And then there are the other items we’ve hit on but deferred: design styles, the timing impact on microcontroller sensor hubs, keeping sensor timing aligned, and partitioning. All of these topics require us to pull back and take a more nuanced look at the growing number of sensor fusion options. This topic will require far more discussion than you’re willing to put up with in this article. Because we know you’d rather read two shorter articles than one doubly-long article. Especially if you get a break in between. (No binge-reading here.)

So we will return to discuss this shortly. It’s a more complicated scene than you might think. For now, we’ll let you gently digest the implications of sub-1-W, teensy FPGAs doing your fusion.

More info:

Lattice’s sensor management