feature article
Subscribe Now

Teaching Machines to See

Xilinx Launches reVISION

The IoT world is all about sensing, and no sense is more important or empowering than vision. We humans rely on our sight to understand the world around us more than any other source of information, and it’s likely that the same will be true for our intelligent machines. From automotive applications like ADAS to drones to factory automation, giving our systems the ability to “see” brings capabilities that are difficult or even impossible to achieve in any other way. 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms. 

That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric – such as Xilinx Zynq and Altera SoC FPGAs – absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space. 

Unfortunately, they also typically bring the requirement of an engineering team with FPGA design expertise. Developing the accelerators for vision algorithms is a non-trivial task, and the accelerator part is typically created using a hardware description language such as Verilog or VHDL, driving a design flow with RTL simulation, synthesis, place and route, and timing closure. In addition to requiring a qualified engineering team with specialized expertise, this can add months to the development cycle.

The problem is just getting worse. Now, AI technologies such as neural networks are being increasingly used for the complex and fuzzy pattern recognition part of vision systems. Neural networks have two distinct modes of operation. “Training” – which is done once on a large sample data set, typically in a data center environment – requires heaping helpings of floating-point computation. Your vision algorithm may be shown millions of pictures of cats, so that it can later automatically recognize cats in video streams. Training sets and tunes the coefficients that will be used in the later “Inference” phase. “Inference” is the in-the-field portion of the neural network. During inference, you want your autonomous mouse to be able to recognize cats as quickly and accurately as possible, engaging its “fight or flight” mode.

Inference is done at the edge of the IoT, or as close to it as possible. You don’t have time for massive amounts of image data to be uploaded to the cloud and processed, delivering a “Hey, that thing you’re looking at is a CAT!” conclusion about 100ms after the limbs are torn from your robotic device. Inference, therefore is typically done with fixed-point (8-bit or less) precision in the IoT edge device itself – minimizing latency and power consumption while maximizing performance. 

This “training” vs “inference” model is very convenient for the companies who make FPGAs and hybrid SoC/FPGA devices. FPGAs are really good at high-speed, low-precision computation. It’s, interestingly, doubly convenient for Xilinx, whose FPGAs and SoCs differ from archrival Intel (Altera) in that Intel’s devices support hardware floating-point (good for training) supposedly at the expense of some performance in the fixed-point domain (good for inference). Xilinx is apparently more than willing to let Intel duke it out with GPUs for the training sockets, while Xilinx focuses advantages on the much-more-lucrative inference sockets.

So, Xilinx is sitting pretty with their Zynq SoCs and MPSoCs, perfectly aligned with the needs of the embedded vision developer and well differentiated from Intel/Altera’s devices. What else could they possibly need?

Oh, yeah, There’s still that “almost impossible to program” issue.

Rewinding a few paragraphs – most of the very large systems companies have well-qualified teams of hardware engineers who can handle the FPGA portion of an embedded vision system. Xilinx has dozens of engagements in every important application segment involving embedded vision – from ADAS to drones to industrial automation. But many companies don’t have the required hardware expertise for it, and they wouldn’t want to dedicate the design time to it even if they did. Plus, crossing the conceptual barrier from vision experts to neural network experts to FPGA design experts and back again is a very expensive, time consuming, and lossy process. What we really need is a way for software developers to be able to harness the power of Zynq devices without bringing in a huge team of hardware experts.

That’s the whole point of Xilinx’s reVISION.

reVISION, announced this week, is a stack – a set of tools, interfaces, and IP – designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development – with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ? the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications. 

Xilinx has also announced a range of embedded vision development kits supporting various cameras and input configurations supporting the reVISION development flow, so you’ll be able to get your design working on actual hardware as quickly as possible.

At press time, Intel had just announced the acquisition of Mobileye – a company specializing in embedded vision and collision avoidance in the automotive market – for almost as much as they paid for Altera. It seems that the stakes in this emerging applications space are going up yet again. It will be interesting to watch the battle unfold.




2 thoughts on “Teaching Machines to See”

  1. Good article, although I have to point out that the reason FPGAs are not used for training has nothing to do with floating point. It has to do with the ability to quickly change the architecture and constants to achieve a high accuracy during training. GPUs and CPUs can do this in milliseconds while any FPGA takes hours of place and route. For inference FPGAs are much better than GPUs or CPUs because these algorithms can be pipelined and run in real time on a FPGA. Something FPGA are better than any other programmable solution. In many cases, as was shown at FPGA2017, fewer bits are needed. Floating point, in a FPGA, gives you the ability to replicate the results of GPUs and CPUs but are not needed in the field. After all the incoming data is not floating point so using floating point just takes area and is not really useful.

  2. Hi there,

    Great article. It’s honestly quite mind-blowing how high-tech and innovative vision technology has become, and how machines/robotics can pretty much see on their own now, isn’t it? There’s another particularly innovative product I recently learned about, from a company called OrCam, so I thought I’d share. Essentially, they have these smart glasses with a camera for the blind/vision impaired/etc. that have an optical character recognition system that helps people without the best vision to see their surroundings. Pretty astounding. Anyhow, thanks for this content, I just thought I’d pass this along.


Leave a Reply

featured blogs
Jan 15, 2021
It's Martin Luther King Day on Monday. Cadence is off. Breakfast Bytes will not appear. And, as is traditional, I go completely off-topic the day before a break. In the past, a lot of novelty in... [[ Click on the title to access the full blog on the Cadence Community s...
Jan 14, 2021
Learn how electronic design automation (EDA) tools & silicon-proven IP enable today's most influential smart tech, including ADAS, 5G, IoT, and Cloud services. The post 5 Key Innovations that Are Making Everything Smarter appeared first on From Silicon To Software....
Jan 13, 2021
Here are some genius solutions to everyday problems you probably didn'€™t even know existed, but after you'€™ve seen them you'€™ll say '€œWow!'€...
Jan 13, 2021
Testing is the final step of any manufacturing process, and arguably the most important, and yet it can often be overlooked.  Releasing a poorly tested product onto the market has destroyed more than one reputation for quality, and this is even more important in an age when ...

featured paper

Overcoming Signal Integrity Challenges of 112G Connections on PCB

Sponsored by Cadence Design Systems

One big challenge with 112G SerDes is handling signal integrity (SI) issues. By the time the signal winds its way from the transmitter on one chip to packages, across traces on PCBs, through connectors or cables, and arrives at the receiver, the signal is very distorted, making it a challenge to recover the clock and data-bits of the information being transferred. Learn how to handle SI issues and ensure that data is faithfully transmitted with a very low bit error rate (BER).

Click here to download the whitepaper

Featured Chalk Talk

Bulk Acoustic Wave (BAW) Technology

Sponsored by Mouser Electronics and Texas Instruments

In industrial applications, crystals are not ideal for generating clock signal timing. They take up valuable PCB real-estate, and aren’t stable in harsh thermal and vibration environments. In this episode of Chalk Talk, Amelia Dalton chats with Nick Smith from Texas Instruments about bulk acoustic wave (BAW) technology that offers an attractive alternative to crystals.

More information about Texas Instruments Bulk Acoustic Wave (BAW) Technology