feature article
Subscribe Now

Teaching Machines to See

Xilinx Launches reVISION

The IoT world is all about sensing, and no sense is more important or empowering than vision. We humans rely on our sight to understand the world around us more than any other source of information, and it’s likely that the same will be true for our intelligent machines. From automotive applications like ADAS to drones to factory automation, giving our systems the ability to “see” brings capabilities that are difficult or even impossible to achieve in any other way. 

But vision is one of the most challenging computational problems of our era. High-resolution cameras generate massive amounts of data, and processing that information in real time requires enormous computing power. Even the fastest conventional processors are not up to the task, and some kind of hardware acceleration is mandatory at the edge. Hardware acceleration options are limited, however. GPUs require too much power for most edge applications, and custom ASICs or dedicated ASSPs are horrifically expensive to create and don’t have the flexibility to keep up with changing requirements and algorithms. 

That makes hardware acceleration via FPGA fabric just about the only viable option. And it makes SoC devices with embedded FPGA fabric – such as Xilinx Zynq and Altera SoC FPGAs – absolutely the solutions of choice. These devices bring the benefits of single-chip integration, ultra-low latency and high bandwidth between the conventional processors and the FPGA fabric, and low power consumption to the embedded vision space. 

Unfortunately, they also typically bring the requirement of an engineering team with FPGA design expertise. Developing the accelerators for vision algorithms is a non-trivial task, and the accelerator part is typically created using a hardware description language such as Verilog or VHDL, driving a design flow with RTL simulation, synthesis, place and route, and timing closure. In addition to requiring a qualified engineering team with specialized expertise, this can add months to the development cycle.

The problem is just getting worse. Now, AI technologies such as neural networks are being increasingly used for the complex and fuzzy pattern recognition part of vision systems. Neural networks have two distinct modes of operation. “Training” – which is done once on a large sample data set, typically in a data center environment – requires heaping helpings of floating-point computation. Your vision algorithm may be shown millions of pictures of cats, so that it can later automatically recognize cats in video streams. Training sets and tunes the coefficients that will be used in the later “Inference” phase. “Inference” is the in-the-field portion of the neural network. During inference, you want your autonomous mouse to be able to recognize cats as quickly and accurately as possible, engaging its “fight or flight” mode.

Inference is done at the edge of the IoT, or as close to it as possible. You don’t have time for massive amounts of image data to be uploaded to the cloud and processed, delivering a “Hey, that thing you’re looking at is a CAT!” conclusion about 100ms after the limbs are torn from your robotic device. Inference, therefore is typically done with fixed-point (8-bit or less) precision in the IoT edge device itself – minimizing latency and power consumption while maximizing performance. 

This “training” vs “inference” model is very convenient for the companies who make FPGAs and hybrid SoC/FPGA devices. FPGAs are really good at high-speed, low-precision computation. It’s, interestingly, doubly convenient for Xilinx, whose FPGAs and SoCs differ from archrival Intel (Altera) in that Intel’s devices support hardware floating-point (good for training) supposedly at the expense of some performance in the fixed-point domain (good for inference). Xilinx is apparently more than willing to let Intel duke it out with GPUs for the training sockets, while Xilinx focuses advantages on the much-more-lucrative inference sockets.

So, Xilinx is sitting pretty with their Zynq SoCs and MPSoCs, perfectly aligned with the needs of the embedded vision developer and well differentiated from Intel/Altera’s devices. What else could they possibly need?

Oh, yeah, There’s still that “almost impossible to program” issue.

Rewinding a few paragraphs – most of the very large systems companies have well-qualified teams of hardware engineers who can handle the FPGA portion of an embedded vision system. Xilinx has dozens of engagements in every important application segment involving embedded vision – from ADAS to drones to industrial automation. But many companies don’t have the required hardware expertise for it, and they wouldn’t want to dedicate the design time to it even if they did. Plus, crossing the conceptual barrier from vision experts to neural network experts to FPGA design experts and back again is a very expensive, time consuming, and lossy process. What we really need is a way for software developers to be able to harness the power of Zynq devices without bringing in a huge team of hardware experts.

That’s the whole point of Xilinx’s reVISION.

reVISION, announced this week, is a stack – a set of tools, interfaces, and IP – designed to let embedded vision application developers start in their own familiar sandbox (OpenVX for vision acceleration and Caffe for machine learning), smoothly navigate down through algorithm development (OpenCV and NN frameworks such as AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN), targeting Zynq devices without the need to bring in a team of FPGA experts. reVISION takes advantage of Xilinx’s previously-announced SDSoC stack to facilitate the algorithm development part. Xilinx claims enormous gains in productivity for embedded vision development – with customers predicting cuts of as much as 12 months from current schedules for new product and update development.

In many systems employing embedded vision, it’s not just the vision that counts. Increasingly, information from the vision system must be processed in concert with information from other types of sensors such as LiDAR, SONAR, RADAR, and others. FPGA-based SoCs are uniquely agile at handling this sensor fusion problem, with the flexibility to adapt to the particular configuration of sensor systems required by each application. This diversity in application requirements is a significant barrier for typical “cost optimization” strategies such as the creation of specialized ASIC and ASSP solutions.

The performance rewards for system developers who successfully harness the power of these devices are substantial. Xilinx is touting benchmarks showing their devices delivering an advantage of 6x images/sec/watt in machine learning inference with GoogLeNet @batch = 1, 42x frames/sec/watt in computer vision with OpenCV, and ? the latency on real-time applications with GoogLeNet @batch = 1 versus “NVidia Tegra and typical SoCs.” These kinds of advantages in latency, performance, and particularly in energy-efficiency can easily be make-or-break for many embedded vision applications. 

Xilinx has also announced a range of embedded vision development kits supporting various cameras and input configurations supporting the reVISION development flow, so you’ll be able to get your design working on actual hardware as quickly as possible.

At press time, Intel had just announced the acquisition of Mobileye – a company specializing in embedded vision and collision avoidance in the automotive market – for almost as much as they paid for Altera. It seems that the stakes in this emerging applications space are going up yet again. It will be interesting to watch the battle unfold.




2 thoughts on “Teaching Machines to See”

  1. Good article, although I have to point out that the reason FPGAs are not used for training has nothing to do with floating point. It has to do with the ability to quickly change the architecture and constants to achieve a high accuracy during training. GPUs and CPUs can do this in milliseconds while any FPGA takes hours of place and route. For inference FPGAs are much better than GPUs or CPUs because these algorithms can be pipelined and run in real time on a FPGA. Something FPGA are better than any other programmable solution. In many cases, as was shown at FPGA2017, fewer bits are needed. Floating point, in a FPGA, gives you the ability to replicate the results of GPUs and CPUs but are not needed in the field. After all the incoming data is not floating point so using floating point just takes area and is not really useful.

  2. Hi there,

    Great article. It’s honestly quite mind-blowing how high-tech and innovative vision technology has become, and how machines/robotics can pretty much see on their own now, isn’t it? There’s another particularly innovative product I recently learned about, from a company called OrCam, so I thought I’d share. Essentially, they have these smart glasses with a camera for the blind/vision impaired/etc. that have an optical character recognition system that helps people without the best vision to see their surroundings. Pretty astounding. Anyhow, thanks for this content, I just thought I’d pass this along.


Leave a Reply

featured blogs
Oct 27, 2020
As we continue this blog series, we'€™re going to keep looking at System Design and Verification Online Training courses. In Part 1 , we went over Verilog language and application, Xcelium simulator,... [[ Click on the title to access the full blog on the Cadence Community...
Oct 27, 2020
Back in January 2020, we rolled out a new experience for component data for our discrete wire products. This update has been very well received. In that blog post, we promised some version 2 updates that would better organize the new data. With this post, we’re happy to...
Oct 26, 2020
Do you have a gadget or gizmo that uses sensors in an ingenious or frivolous way? If so, claim your 15 minutes of fame at the virtual Sensors Innovation Fall Week event....
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...

featured video

Better PPA with Innovus Mixed Placer Technology – Gigaplace XL

Sponsored by Cadence Design Systems

With the increase of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan with manual methods. Innovus Implementation’s advanced multi-objective placement technology, GigaPlace XL, provides automation to optimize at scale, concurrent placement of macros, and standard cells for multiple objectives like timing, wirelength, congestion, and power. This technology provides an innovative way to address design productivity along with design quality improvements reducing weeks of manual floorplan time down to a few hours.

Click here for more information about Innovus Implementation System

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

featured chalk talk

TI Robotics System Learning Kit

Sponsored by Mouser Electronics and Texas Instruments

Robotics projects can get complicated quickly, and finding a set of components, controllers, networking, and software that plays nicely together is a real headache. In this episode of Chalk Talk, Amelia Dalton chats with Mark Easley of Texas Instruments aVBOUT THE TI-RSLK Robotics Kit, which will get you up and running on your next robotics project in no time.

Click here for more information about the Texas Instruments TIRSLK-EVM Robotics System Lab Kit