Turning Flat Images Into 3D Scenes

If Elon says it, it must be true. Autonomous vehicles don’t need no steenkin’ lidar sensors. We can do it all with cameras. A nice idea, but converting camera images into useful data in real time is tricky. Very tricky.

Cameras are stupid, mostly because they operate independent of one another. They have no spatial awareness, no concept of depth, and no idea of what’s important or what’s trivial. As mammals, we have two eyes pointed more or less in the same direction to give us stereoscopic vision. Our brains analyze parallax angles — the slight differences between those two images — to derive depth information. Birds and dolphins and other creatures with eyes on opposite sides of their head can’t do that. Neither can most camera systems.

To make up for their unfortunate evolutionary shortcomings, cameras typically employ some unrelated sense to divine distance. Most use a time-of-flight (ToF) sensor, often an infrared LED paired with a receiver, to bounce an invisible signal off the subject(s) while measuring the time it takes for that light to come back. ToF sensors are also a frequent cause of the creepy “red eye” effect when they illuminate someone’s retina.

Apple characteristically took a different approach, using “structured light” to project a grid of invisible dots onto subjects. Either way, you’re augmenting the camera(s) with additional sensors to infer distance. The cameras themselves can’t figure that out.

Except when they can. Tesla and others have no problem using cameras in pairs, just like mammals do, and inferring distance derived from stereoscopic vision. But that takes a lot of computing power, something that nobody likes but which autonomous vehicles can afford.

But what happens when you don’t have a 4000-pound enclosure and a kilowatt of electrical energy to expend on vision? What if you can’t afford a dozen nVidia GPUs and a small army of programmers to train them? What if you’re… a normal engineer with a budget, a deadline, and a problem?

This is where Eys3D thinks it has a solution. The small Taiwanese company recently got funding from ARM’s IoT Capital group and others to develop chips that extract 3D depth information from conventional 2D cameras and image sensors. Eys3D isn’t trying to replace exotic ADAS implementations, but is instead eyeing low-end industrial and robotic systems that need 3D vision on a budget.

Instead of feeding streaming video data to a high-end processor (or collection of processors) and programming them to extract the depth information, Eys3D does it all in hardware. Its hardwired chips accept two or more camera inputs and output a depth map and/or point cloud. It’s a preprocessor that “3D-ifies” video data, thus offloading the task from the host processor. The host can therefore be much simpler, cheaper, and easier to program. It can also bring 3D vision to existing systems that might have had only 2D vision before.

Naturally, the system requires at least two cameras, and they have to be separated by some distance in order to provide stereoscopic vision. How much distance? Not a lot. A medical endoscope provider places its two cameras just 2mm apart. At the other extreme, a highway truck-inspection system has its cameras 2 meters apart. Greater distances provide a greater depth of field.

Adding more cameras doesn’t hurt, either. With three in a row, two closely spaced cameras can focus on nearby objects, while the third provides a longer-range view. Adding off-axis cameras (that is, not inline with the others) builds a richer 3D point cloud, and so on.

A typical industrial application might involve assembly-line inspection. A pair of cameras placed over a moving stack of product can tell whether one stack is too low or too high, or if a container is under- or overfilled. Conical piles of loose matter (wood chips, coal, etc.) can be measured volumetrically simply by looking. “You can see a lot just by observing,” as Yogi Berra said.

Eys3D’s current designs are hardwired, but the next generation (due in about 18 months) will be programmable. The company says those will use a combination of DSP cores and memory arrays and be programmable in C. Given its recent investment from ARM, it’s easy to guess where the processing elements might come from, and Eys3D has historic ties to memory vendor Etron, so no surprises there, either.

Depth estimation is hard in software. It requires high-end silicon, lots of data, lots of energy, and talented programmers. It’s doable, but not particularly efficient. It’s also a “check box” feature that many industrial or robotics developers want to have but don’t want to develop themselves. Like a TCP/IP stack or a USB interface, vision-based depth analysis just needs to work so that you can get on with your real job. Eys3D isn’t trying to solve every problem — just to help developers who are out of their depth.

Turning Flat Images Into 3D Scenes

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk