feature article
Subscribe Now

Let There Be Vision

Cadence Tensilica Vision P5 Lets the Light In

The Internet of People has cameras – literally billions of them. They are in smartphones, laptops, tablets, WiFi devices – it sometimes seems they’re watching our every move. This incredible volume of information is then (somewhat) intelligently analyzed, edited, and moderated by the vast visual computing power of the enormous array of human brains behind these cameras. The amount of computation required to filter, process, and interpret this image data is staggering. The end result is, of course, an almost infinite wasteland of cat videos on Facebook and YouTube. But video processing has higher purposes as well.

The Internet of Things is missing that oversized array of human brains to process, filter, and interpret its stream of image data. But the IoT needs “eyes” nonetheless. As we give our intelligent machines the power of vision, we face a daunting challenge in delivering the processing capacity required without blowing our budgets for power, price, and form factor. The analysis of all of this image data is perhaps the most compelling use case for heterogeneous computing and specialized processors.

We have written often about heterogeneous architectures like FPGAs paired with conventional application processors for vision applications. But what happens if you’re designing an SoC to take it to the next level – higher volume, lower cost, and superior performance? You won’t be fitting an FPGA/processor vision combo into your next smartphone design. You need something smaller, more efficient, and cheaper that can still deliver the performance required for advanced image processing and analysis.

Cadence’s Tensilica processors have long been a favorite for those wanting large amounts of DSP power in their SoC designs. By allowing us to customize our processor for the exact type of task we’re doing, Tensilica gives us the ability to have a highly optimized architecture that delivers vastly better performance, energy efficiency, and cost. General-purpose processors carry a lot of overhead for the wide range of applications they may be called upon to support. But with the ability to optimize the architecture for a narrowly defined application, we can make significant gains.

Even before being acquired by Cadence, Tensilica has been working quietly in the background of the industry. While everyone is aware of ARM’s omnipresence, few people know that over two billion Tensilica processors are also out there at work, often side-by-side with more general-purpose ARM architectures, doing the heavy lifting in a wide range of challenging applications like IoT, mobile, storage, and networking. 

Now Cadence has introduced a line of Tensilica processor IP optimized specifically for imaging and vision. The new Tensilica Vision P5 is designed to handle the entire camera-processing pipeline, and it involves a lot more that just an optimized processor architecture. The image-processing problem starts with common tasks like correcting for physical attributes of the camera – lens distortion, brightness falloff, color correction, sensor defects, and so forth. Once the image is cleaned up a bit, we need to add things like stabilization, high-dynamic range, 2D/3D noise reduction, and resolution enhancement. Finally, we get into the actual intelligent vision tasks – detecting people, faces, objects, gestures, and various types of motion and events.

Implemented in 16nm FinFET technology, the Vision P5 processor runs at 1.1GHz. It is deeply pipelined with advanced clock gating for energy efficiency. It boasts an ultra-wide 1024-bit memory interface with what the company calls “SuperGather” technology, which increases memory parallelism by 16x – reading and writing non-contiguous addresses in parallel. There are vector extensions that allow four vector operations per cycle, with 64-way SIMD – resulting in a possible 256 ALU operations per clock cycle. The vector extensions include 8, 16, and 32-bit data types with optional IEEE 32-bit vector floating point. These vector extensions can help significantly with GPU code porting. 

The optional vector floating point unit supports 32-bit IEEE 754, with 16-way single precision. It has a 16-entry 32-bit floating point register file and delivers 32 GFLOPS with a single 1GHz core. The tool suite provides easy migration from GPU versions of your code, and the unit is highly power-optimized to improve performance-per-watt for floating point ops.

On the software tool side, Vision P5 is supported by an auto-vectorizing compiler with OpenCV and OpenVx libraries that add over 800 optimized functions for vision and image processing. Cadence also partners with a number of companies to produce a robust ecosystem for Vision P5, including software and IP specifically targeting a number of application domains in areas like mobile, ADAS, Security, and IoT/Wearables. The bottom line is that a lot of the non-differentiating drudge work of intelligent vision is already done for you, so you can focus your energy on the interesting and exciting parts that will make your product special. 

How does it perform? Cadence claims that Vision P5 can deliver 13x the performance of IVP-EP with one-fifth the energy. That’s an enormous win in the performance-per-watt race – which is ultimately the key metric in most embedded-vision applications.

When it comes time to scale, you can implement a multi-processor version of Vision P5 to achieve some staggering performance numbers – up to one Tera-op in less than 2mm squared of silicon area. That should be enough oomph for just about any vision application you’d be working on today. The scalability of the Vision P5 solution is particularly attractive considering the wide range of potential end applications. You can dial in just enough capability to satisfy your system needs without a lot of extra baggage.

Despite being optimized for high-performance vision applications, Vision P5 leaves the flexibility where you need it. With cameras, sensors, and optics constantly changing and with vision algorithms advancing every day, you need the flexibility of software to allow your system to evolve with the times and the technology. The rich set of tools, IP, reference designs, and partner offerings that accompany Vision P5 allow you to have it both ways – rapid deployment and ultimate configurability. In the domain of custom SoC vision technology, we haven’t seen an offering that competes directly with Vision P5. It brings unique capabilities to the table. 

Leave a Reply

featured blogs
Jul 13, 2020
As I write this in early July, we are looking at the calendar of events and trade shows for this year, and there are few survivors.  The Coronavirus pandemic of 2020 has seen almost all public events cancelled, from the Olympics to the Eurovision Song Contest.  Less...
Jul 10, 2020
[From the last episode: We looked at the convolution that defines the CNNs that are so popular for machine vision applications.] This week we'€™re going to do some more math, although, in this case, it won'€™t be as obscure and bizarre as convolution '€“ and yet we will...
Jul 10, 2020
I need a problem that lends itself to being solved using a genetic algorithm; also, one whose evolving results can be displayed on my 12 x 12 ping pong ball array....

featured video

Product Update: What’s Hot in DesignWare® IP for PCIe® 5.0

Sponsored by Synopsys

Get the latest update on Synopsys' DesignWare Controller and PHY IP for PCIe 5.0 and how the low-latency, compact, power-efficient, and silicon-proven solution can enable your SoCs while reducing risk.

Click here for more information about DesignWare IP Solutions for PCI Express

Featured Paper

Improving Performance in High-Voltage Systems With Zero-Drift Hall-Effect Current Sensing

Sponsored by Texas Instruments

Learn how major industry trends are driving demands for isolated current sensing, and how new zero-drift Hall-effect current sensors can improve isolation and measurement drift while simplifying the design process.

Click here for more information

Featured Chalk Talk

The Future of Automotive Interconnects

Sponsored by Mouser Electronics and Molex

The modern automobile is practically a data center on wheels, with countless processors, controllers, sensors, and intelligent systems that need to communicate reliably. Choosing the right interconnect solutions is front and center in the design of these complex systems. In this episode of Chalk Talk, Amelia Dalton chats with Rudy Waluch of Molex about interconnect solutions for today’s automotive designs.

Click here for more information about about Molex Transportation Solutions