feature article
Subscribe Now

Computer Vision 101

Making Machines See the World is Harder Than it Looks

“Vision is the art of seeing what is invisible to others.” — Jonathan Swift

What started out as an experiment has become an obsession. I wanted to see what happened to the dinosaurs.

I confess, I have (or had) two big concrete dinosaurs in my backyard. They’re awkward and massive and weigh about 200–300 pounds each. They look great hiding amongst the ferns, and they scare off rodents. And startle houseguests.

But one of them disappeared without a trace. Just like the real dinosaurs, he was apparently abducted by aliens who teleported him up to their spaceship for invasive examinations. What other explanation could there be? He’s far too heavy for one or two people to move unaided. There were no drag marks in the dirt and no broken pieces anywhere, so he wasn’t removed piecemeal. He just… vanished.

Time for some security cameras. If the other one disappears, I want to get this on video.

Buying and installing outdoor high-resolution cameras is easy. The hard part is training them to do something useful. Like a lot of projects, you think you’re mostly finished when the hardware works, but it’s really the software that takes up all the time.

The cameras themselves have rudimentary motion-detection features burned into their firmware, but it’s not very useful. So instead, they’re configured to squirt raw video streams at about 1 Mbps to a dedicated PC running specialized camera software. There’s an 8-core CPU with 16GB of RAM crunching away on the video, looking for alien dinosaur snatchers. This ought to make great YouTube fodder.

And it would, if you’re into birds, clouds, spiderwebs, and light summer breezes. The cameras triggered on everything. They were overly sensitive to changes in light, irrelevant background motion, random wildlife flying/hopping/crawling past its field of view – you name it, they recorded it.

Fortunately, the camera software has lots of parameters you can tweak. It’s a fantasyland of virtual knobs and dials and switches you can adjust to fine-tune your own surveillance system. The NSA has nothing like this, I’m sure.  

Trouble is, cameras and computers see the world as flat, so they have lousy depth perception. It’s all just pixels to them. A ten-pixel object up close is the same as a ten-pixel object at 50 paces; it can’t tell the difference between near and far, so it has no idea of relative sizes. A bug crawling on the camera lens looks yuuuge, not unlike a Sasquatch in the distance. How to tell them apart?

Similarly, they’re not good at shapes. The software can detect “blobs,” in the sense that it knows when contiguous pixels all move and change together. It isn’t fooled by dust or snow, for example, because those pixels are scattered all over the image. It can detect a person or a bird (or a spaceship?) because those pixels are clustered. So there’s that. But you can’t teach it good shapes from bad shapes. All it knows is the size of the blob (in pixels), which doesn’t correlate to anything useful in the real world. A cardboard box is treated the same as a face. This is not ideal.

It understands color, but not the same way we do. In fact, color is all it understands: the relative light value of each pixel, on a scale from 0 to some large number. Your job is to teach it what color(s) or light/dark combinations are worthy of notice. Again, harder than it sounds.

On the plus side, the cameras have remarkably good night vision. They may actually work better at night than during the daytime, because the cameras have their own infrared LEDs that emit invisible (to us) illumination, like a built-in spotlight. That lighting is more consistent and controlled than miserable ol’ sunlight, which changes position and varies in brightness. Night vision is so much more controlled.

It’s also eerily good at spotting nocturnal wildlife. Mammalian retinas reflect light, that spooky glow-in-the-dark effect familiar from TV nature documentaries. The critters think they’re being sneaky when they’re actually all lit up. Clearly, our ancient animal ancestors didn’t have flashlights, or this unfortunate characteristic would’ve evolved out years ago.

A few months into this, I’ve collected enough video for a third-rate nature documentary. Want to see how grass blows in the wind? I’ve got hours of footage. Bugs? We got bugs of all shapes and sizes. Annoyingly, spiders seem attracted to the cameras’ IR LEDs and build webs across the lens. It’s giant 1950s B-movie monsters all over again, but in color.

The state of the art hasn’t progressed much in the past few decades. We’re still trying to get Von Neumann machines to “see” things for what they are. I worked for a robotics company that offered machine vision as an option. This consisted of black-and-white cameras that could tell whether a gear, gizmo, or gadget on a conveyer belt was out of place or misaligned. Even then, it took a bit of training and tweaking to make it reliable.

One large baking company used our robots to assemble sandwich cookies, and wanted the machines trained to deliberately misalign the two halves by 3 degrees, so that the cookies would look more “homemade.” In that case, the machines were capable of being too accurate. A rare exception.

Today, we have a big tool bag of new approaches to machine vision. Cognex, National Instruments, Sensory, Optotune, and others are all tackling this, although most use conventional hardware, with some ASICs or FPGAs thrown in. Google’s TensorFlow also makes an appearance, taking the “big data” approach to teaching your machine what’s what. There’s a one-day class coming up in October if you’re in the San Jose area.

While you go do that, I’ll go back to looking for aliens. Somehow, they’ve eluded me all this time. I’ve got tons of spider footage, though. Maybe they somehow conspired to work together…

2 thoughts on “Computer Vision 101”

Leave a Reply

featured blogs
Jun 19, 2019
In this week's Whiteboard Wednesdays video, Craig Johnson explains the purpose of the Cloud Passport Partners Program and how customers can now connect with authorized and knowledgeable service... [[ Click on the title to access the full blog on the Cadence Community si...
Jun 17, 2019
eSilicon used the Tessent family of DFT solutions to solve their toughest challenges of testing a large 2.5D/3D deep learning device. Hear all about it in this 15 minute video. With over 300 tapeouts, there is no doubt that eSilicon has tons of experience getting designs don...
Jun 17, 2019
One topic that garnered much interest at the Samtec booth at the International Microwave Symposium was a dynamic stress test of Samtec’s microwave cable. This demonstration proves that our low-loss, microwave cable withstands high flex cycles and still maintains signal ...
Jan 25, 2019
Let'€™s face it: We'€™re addicted to SRAM. It'€™s big, it'€™s power-hungry, but it'€™s fast. And no matter how much we complain about it, we still use it. Because we don'€™t have anything better in the mainstream yet. We'€™ve looked at attempts to improve conven...