feature article
Subscribe Now

Teaching Machines to See

Xilinx Launches reVISION

The IoT world is all about sensing, and no sense is more important or empowering than vision. We humans rely on our sight to understand the world around us more than any other source of information, and it’s likely that the same will be true for our intelligent machines. From automotive applications like ADAS to drones to factory automation, giving our systems the ability to “see” brings capabilities that are difficult or even impossible to achieve in any other way. 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms. 

That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric – such as Xilinx Zynq and Altera SoC FPGAs – absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space. 

Unfortunately, they also typically bring the requirement of an engineering team with FPGA design expertise. Developing the accelerators for vision algorithms is a non-trivial task, and the accelerator part is typically created using a hardware description language such as Verilog or VHDL, driving a design flow with RTL simulation, synthesis, place and route, and timing closure. In addition to requiring a qualified engineering team with specialized expertise, this can add months to the development cycle.

The problem is just getting worse. Now, AI technologies such as neural networks are being increasingly used for the complex and fuzzy pattern recognition part of vision systems. Neural networks have two distinct modes of operation. “Training” – which is done once on a large sample data set, typically in a data center environment – requires heaping helpings of floating-point computation. Your vision algorithm may be shown millions of pictures of cats, so that it can later automatically recognize cats in video streams. Training sets and tunes the coefficients that will be used in the later “Inference” phase. “Inference” is the in-the-field portion of the neural network. During inference, you want your autonomous mouse to be able to recognize cats as quickly and accurately as possible, engaging its “fight or flight” mode.

Inference is done at the edge of the IoT, or as close to it as possible. You don’t have time for massive amounts of image data to be uploaded to the cloud and processed, delivering a “Hey, that thing you’re looking at is a CAT!” conclusion about 100ms after the limbs are torn from your robotic device. Inference, therefore is typically done with fixed-point (8-bit or less) precision in the IoT edge device itself – minimizing latency and power consumption while maximizing performance. 

This “training” vs “inference” model is very convenient for the companies who make FPGAs and hybrid SoC/FPGA devices. FPGAs are really good at high-speed, low-precision computation. It’s, interestingly, doubly convenient for Xilinx, whose FPGAs and SoCs differ from archrival Intel (Altera) in that Intel’s devices support hardware floating-point (good for training) supposedly at the expense of some performance in the fixed-point domain (good for inference). Xilinx is apparently more than willing to let Intel duke it out with GPUs for the training sockets, while Xilinx focuses advantages on the much-more-lucrative inference sockets.

So, Xilinx is sitting pretty with their Zynq SoCs and MPSoCs, perfectly aligned with the needs of the embedded vision developer and well differentiated from Intel/Altera’s devices. What else could they possibly need?

Oh, yeah, There’s still that “almost impossible to program” issue.

Rewinding a few paragraphs – most of the very large systems companies have well-qualified teams of hardware engineers who can handle the FPGA portion of an embedded vision system. Xilinx has dozens of engagements in every important application segment involving embedded vision – from ADAS to drones to industrial automation. But many companies don’t have the required hardware expertise for it, and they wouldn’t want to dedicate the design time to it even if they did. Plus, crossing the conceptual barrier from vision experts to neural network experts to FPGA design experts and back again is a very expensive, time consuming, and lossy process. What we really need is a way for software developers to be able to harness the power of Zynq devices without bringing in a huge team of hardware experts.

That’s the whole point of Xilinx’s reVISION.

reVISION, announced this week, is a stack – a set of tools, interfaces, and IP – designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development – with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ? the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications. 

Xilinx has also announced a range of embedded vision development kits supporting various cameras and input configurations supporting the reVISION development flow, so you’ll be able to get your design working on actual hardware as quickly as possible.

At press time, Intel had just announced the acquisition of Mobileye – a company specializing in embedded vision and collision avoidance in the automotive market – for almost as much as they paid for Altera. It seems that the stakes in this emerging applications space are going up yet again. It will be interesting to watch the battle unfold.




2 thoughts on “Teaching Machines to See”

  1. Good article, although I have to point out that the reason FPGAs are not used for training has nothing to do with floating point. It has to do with the ability to quickly change the architecture and constants to achieve a high accuracy during training. GPUs and CPUs can do this in milliseconds while any FPGA takes hours of place and route. For inference FPGAs are much better than GPUs or CPUs because these algorithms can be pipelined and run in real time on a FPGA. Something FPGA are better than any other programmable solution. In many cases, as was shown at FPGA2017, fewer bits are needed. Floating point, in a FPGA, gives you the ability to replicate the results of GPUs and CPUs but are not needed in the field. After all the incoming data is not floating point so using floating point just takes area and is not really useful.

  2. Hi there,

    Great article. It’s honestly quite mind-blowing how high-tech and innovative vision technology has become, and how machines/robotics can pretty much see on their own now, isn’t it? There’s another particularly innovative product I recently learned about, from a company called OrCam, so I thought I’d share. Essentially, they have these smart glasses with a camera for the blind/vision impaired/etc. that have an optical character recognition system that helps people without the best vision to see their surroundings. Pretty astounding. Anyhow, thanks for this content, I just thought I’d pass this along.


Leave a Reply

featured blogs
May 20, 2022
This year's NASA Turbulence Modeling Symposium is being held in honor of Philippe Spalart and his contributions to the turbulence modeling field. The symposium will bring together both academic and... ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Sensor Technologies Here to Stay: Post-pandemic

Sponsored by Infineon

Today sensor technology has become integral to our everyday lives. And in the future, sensor technology will mean even more than it does today. In this episode of Chalk Talk, Amelia Dalton chats with David Jones from Infineon about the future of sensor technologies and how they are going to impact our lives in the post-pandemic world. They investigate how miniaturization, built-in antennas in-package and the evolution of radar technology have helped usher in a whole new era of sensing technologies and how all of this and more will help us live healthier and happier lives.

Click here for more information about Infineon's sensor technology portfolio