It’s damn sexy. You go to a picture online and you click anywhere and, “Boom,” that bit comes into focus. The first time I saw it, I was seduced, but I couldn’t escape that question: how do they do it?
The first embodiment of something like this came courtesy of Lytro; it’s what some refer to as 4D vision, although the formal name is “plenoptic.” The idea is that the camera captures all of the light information from all directions so that it can establish the focus at any point.
But there’s a challenger to this capability, and they use a completely different way of achieving a similar effect. In fact, you might consider it a completely brute-force approach. It has some benefits and perhaps a drawback. But to understand the differences, we need to compare the two approaches.
The Lytro approach relies on microlenses placed above the imaging chip. Each of those lenses captures the light from all directions impinging on that set of pixels, and the pixels record all of that information. The resolution, however, becomes coarser: it’s no longer the image sensor resolution, but rather the microlens resolution, which is typically 10-20% of the chip resolution.
So the good news is that you have a full complement of the light information, but you get a lower resolution image. Too low to be useful? Well, certainly not for internet-based pictures. But definitely lower than what you’d get for a single standard image.
So here comes DigitalOptics. Oh, um, I guess I should be more specific: this is the DigitalOptics that operates out of the US, a subsidiary of Tessera. Not to be confused with the one in New Zealand. There’s apparently also one in Korea. This appears to be a popular company name, so we’ll let them duke that out themselves.
You probably saw the news as they made their splash in Barcelona. Frankly, I saw more hits of their release than anything else emanating from the Iberian Peninsula during the madness that constitutes the Mobile World Congress. Why? Because, like I said, it’s damn sexy.
But they rely on a completely different way of making it happen. And, frankly, it’s relevant only now that the cameras going into phones are becoming more selective on focus. In the past, they relied on an aperture that made pretty much everything in focus at the same time. Cheap and easy, but harder to compose more professional-looking pictures. And we all know that, if there’s anything we want when we’re tanked off our arses, looking damn sexy ourselves if you ignore the drool and unfocused eyes, is to be able to compose a professional-level selfie with our besties while attempting to look all gansta.
So as the aperture gets narrower (partly due to shrinking cameras), the issue of focus becomes more relevant. So now the question becomes, which part of the scene is supposed to be in focus? And then how do you focus it?
Because we’re talking phones here, we don’t have the issue of moving around heavy, expensive optics, such as you find on single-lens reflex (SLR) cameras. You’ve got this tiny, miniaturized lens, so moving it around is much easier. Traditionally, it’s been done with a voice coil – a small, sensitive “motor” that can make the subtle adjustments required to change the focus.
The challenge is, according to DigitalOptics, that the voice coil is not only relatively slow to move, but it also suffers from hysteresis and has a required settling time due to the springs that need a chance to calm down. This limits how quickly pictures can be taken. Of course, we’ve all had the situation where we carefully compose a picture of a bird, for instance, and we press the button, and, by the time the picture actually clicks, the bird has landed in some other county.
This is where the MEMS comes in, it being the critical element in DigitalOptics’ new mems|cam: they use a MEMS actuator instead of the voice coil. It’s smaller, but, most importantly, it’s much faster. So much so that – and here’s the key – in the time that we might expect to take a single image, it can take multiple images with different focal points.
And that, fundamentally, is how you brute-force what acts like a plenoptic camera. Rather than gathering all of that light information in a lower-resolution meta-image, you justtake a bunch of different pictures with different points of focus, and then the viewer can select which of those to view (although, presumably, if presented effectively, the viewer would not know that this is what’s happening… it would simply look magical, like the Lytro stuff). And it’s full resolution: no microlenses.
This, of course, raises a number of questions. How many pictures to take? How to stage the focal points? How do you store this stuff?
Let’s start with the issue of focus. If you’re going to capture a scene and make available a multiplicity of focus options, you could, for instance, divide up the distance from here to infinity into, oh, a hundred chunks and take a hundred images and hope that each point is in focus in at least one of them. That relies on some luck and lots of data storage. If your luck is bad, then you could go with finer gradations, but that’s more data. Lots more.
That’s not the approach they take.
Instead, they use scene analysis to figure out where the likely focal points are. And this is where things get a bit… fuzzy. Because scene analysis is inherently heuristic. Is that a face in the frame or some trick of the pattern? (We’ve all seen the examples of Facebook asking you to tag something that isn’t a face at all… Of course, DigitalOptics’s algorithms might well be better than Facebook’s…) What about other kinds of scenes?
Not only is there complexity here, but it takes time. It would do no good if you sped up focus time but used all that time up trying to figure out where to focus. Which is why they provide IP to their partners for hardening onto silicon. For instance, they have a face tracking algorithm that’s useful not only for identifying that there’s a face in the scene, but also for keeping with the face if it is in motion and you’re taking multiple pictures.
This and other algorithms are carved into silicon so that they operate quickly. This means that, rather than the arbitrary divide-and-conquer strategy posited above, they analyze the scene to figure out where the useful bits are and then schedule around six images with different points of focus. All six of those images can be captured within about 400 ms (including scene analysis).
Note that this is the time for capturing the image in a RAM buffer, not for getting it onto FLASH. Power-winder capability isn’t really a thing on phones, at least not yet. But with digital cameras that have that feature, there’s always a limit to how many shots you can take in a row because the RAM buffer fills up and then you have to wait for it to empty to FLASH. If a feature like this is going to end up on a phone, this becomes even more of a challenge, since each “image” is actually around six images, ratcheting up the storage requirements.
While the first camera due out with the mems|cam capabilities will do multi-focus, it’s unclear that it will do power-winder kinds of multi-shot series, but, given sufficient storage, that should be feasible.
The images themselves are delivered in an MP file, which is effectively a container for multiple jpeg files. Selecting a point of focus to view involves a presentation app that takes the user interaction and extracts one of the jpegs from the MP file. Now… unlike the “true” plenoptic camera, it’s possible that a viewer might click a spot that’s not in focus in any of the images. In that case, this method loses out to a Lytro-like device.
So there you have it: a quasi-plenoptic capability on your phone. Note that it’s not a matter of market focus for this to be a phone thing: the MEMS approach works only on phones or any other equipment with micro-cameras. You’re not likely to see this coming to an SLR near you because of the bulkiness and weight of all that glass that has to move when you change focus.
Phones with mems|cams built in haven’t hit the streets yet; DigitalOptics is working with Fujitsu and others (with only Fujitsu allowing public naming). You should see them start arriving during the back half of this year.
3 thoughts on “Quasi-Plenoptics”
How do you view the mems|cam approach to 4D vision as compared to a full plenoptic approach?
This is a significantly superior approach.
In fact I which my camera had that. It has bracketing for different exposure settings, but it would be a great feature to take multiple pictures at different focus settings.
This whole Lytro thing is IMHO just a gimmick because of the resulting low resolution.
One more comment: A narrow aperture (like f/22) gives you a large depth of field. A wide aperture (like f/2.0) gives you a short depth of focus, but better results in low light. The wording in your article appears to be confusing this.
Ooh! Guilty as charged. (And I’m coming to this kinda late…) I changed the wording to fix what you correctly pointed out… (actually eliminated the “wide” since it wasn’t that relevant).