feature article
Subscribe Now

Let There Be Vision

Cadence Tensilica Vision P5 Lets the Light In

The Internet of People has cameras – literally billions of them. They are in smartphones, laptops, tablets, WiFi devices – it sometimes seems they’re watching our every move. This incredible volume of information is then (somewhat) intelligently analyzed, edited, and moderated by the vast visual computing power of the enormous array of human brains behind these cameras. The amount of computation required to filter, process, and interpret this image data is staggering. The end result is, of course, an almost infinite wasteland of cat videos on Facebook and YouTube. But video processing has higher purposes as well.

The Internet of Things is missing that oversized array of human brains to process, filter, and interpret its stream of image data. But the IoT needs “eyes” nonetheless. As we give our intelligent machines the power of vision, we face a daunting challenge in delivering the processing capacity required without blowing our budgets for power, price, and form factor. The analysis of all of this image data is perhaps the most compelling use case for heterogeneous computing and specialized processors.

We have written often about heterogeneous architectures like FPGAs paired with conventional application processors for vision applications. But what happens if you’re designing an SoC to take it to the next level – higher volume, lower cost, and superior performance? You won’t be fitting an FPGA/processor vision combo into your next smartphone design. You need something smaller, more efficient, and cheaper that can still deliver the performance required for advanced image processing and analysis.

Cadence’s Tensilica processors have long been a favorite for those wanting large amounts of DSP power in their SoC designs. By allowing us to customize our processor for the exact type of task we’re doing, Tensilica gives us the ability to have a highly optimized architecture that delivers vastly better performance, energy efficiency, and cost. General-purpose processors carry a lot of overhead for the wide range of applications they may be called upon to support. But with the ability to optimize the architecture for a narrowly defined application, we can make significant gains.

Even before being acquired by Cadence, Tensilica has been working quietly in the background of the industry. While everyone is aware of ARM’s omnipresence, few people know that over two billion Tensilica processors are also out there at work, often side-by-side with more general-purpose ARM architectures, doing the heavy lifting in a wide range of challenging applications like IoT, mobile, storage, and networking. 

Now Cadence has introduced a line of Tensilica processor IP optimized specifically for imaging and vision. The new Tensilica Vision P5 is designed to handle the entire camera-processing pipeline, and it involves a lot more that just an optimized processor architecture. The image-processing problem starts with common tasks like correcting for physical attributes of the camera – lens distortion, brightness falloff, color correction, sensor defects, and so forth. Once the image is cleaned up a bit, we need to add things like stabilization, high-dynamic range, 2D/3D noise reduction, and resolution enhancement. Finally, we get into the actual intelligent vision tasks – detecting people, faces, objects, gestures, and various types of motion and events.

Implemented in 16nm FinFET technology, the Vision P5 processor runs at 1.1GHz. It is deeply pipelined with advanced clock gating for energy efficiency. It boasts an ultra-wide 1024-bit memory interface with what the company calls “SuperGather” technology, which increases memory parallelism by 16x – reading and writing non-contiguous addresses in parallel. There are vector extensions that allow four vector operations per cycle, with 64-way SIMD – resulting in a possible 256 ALU operations per clock cycle. The vector extensions include 8, 16, and 32-bit data types with optional IEEE 32-bit vector floating point. These vector extensions can help significantly with GPU code porting. 

The optional vector floating point unit supports 32-bit IEEE 754, with 16-way single precision. It has a 16-entry 32-bit floating point register file and delivers 32 GFLOPS with a single 1GHz core. The tool suite provides easy migration from GPU versions of your code, and the unit is highly power-optimized to improve performance-per-watt for floating point ops.

On the software tool side, Vision P5 is supported by an auto-vectorizing compiler with OpenCV and OpenVx libraries that add over 800 optimized functions for vision and image processing. Cadence also partners with a number of companies to produce a robust ecosystem for Vision P5, including software and IP specifically targeting a number of application domains in areas like mobile, ADAS, Security, and IoT/Wearables. The bottom line is that a lot of the non-differentiating drudge work of intelligent vision is already done for you, so you can focus your energy on the interesting and exciting parts that will make your product special. 

How does it perform? Cadence claims that Vision P5 can deliver 13x the performance of IVP-EP with one-fifth the energy. That’s an enormous win in the performance-per-watt race – which is ultimately the key metric in most embedded-vision applications.

When it comes time to scale, you can implement a multi-processor version of Vision P5 to achieve some staggering performance numbers – up to one Tera-op in less than 2mm squared of silicon area. That should be enough oomph for just about any vision application you’d be working on today. The scalability of the Vision P5 solution is particularly attractive considering the wide range of potential end applications. You can dial in just enough capability to satisfy your system needs without a lot of extra baggage.

Despite being optimized for high-performance vision applications, Vision P5 leaves the flexibility where you need it. With cameras, sensors, and optics constantly changing and with vision algorithms advancing every day, you need the flexibility of software to allow your system to evolve with the times and the technology. The rich set of tools, IP, reference designs, and partner offerings that accompany Vision P5 allow you to have it both ways – rapid deployment and ultimate configurability. In the domain of custom SoC vision technology, we haven’t seen an offering that competes directly with Vision P5. It brings unique capabilities to the table. 

Leave a Reply

featured blogs
May 14, 2021
Another Friday, another week chock full of CFD, CAE, and CAD news. This week features a topic near and dear to my heart involving death of the rainbow color map for displaying simulation results.... [[ Click on the title to access the full blog on the Cadence Community site....
May 13, 2021
Samtec will attend the PCI-SIG Virtual Developers Conference on Tuesday, May 25th through Wednesday, May 26th, 2021. This is a free event for the 800+ member companies that develop and bring to market new products utilizing PCI Express technology. Attendee Registration is sti...
May 13, 2021
Our new IC design tool, PrimeSim Continuum, enables the next generation of hyper-convergent IC designs. Learn more from eeNews, Electronic Design & EE Times. The post Synopsys Makes Headlines with PrimeSim Continuum, an Innovative Circuit Simulation Solution appeared fi...
May 13, 2021
By Calibre Design Staff Prior to the availability of extreme ultraviolet (EUV) lithography, multi-patterning provided… The post A SAMPle of what you need to know about SAMP technology appeared first on Design with Calibre....

featured video

Insights on StarRC Standalone Netlist Reducer

Sponsored by Synopsys

With the ever-growing size of extracted netlists, parasitic optimization is key to achieve practical simulation run times. Key trade-off for any netlist reducer is accuracy vs netlist size. StarRC Standalone Netlist reducer provides the flexibility to optimize your netlist on a per net basis. The user has total control of trading accuracy of some nets versus netlist optimization - yet another feature from StarRC to provide flexibility to the designer.

Click here for more information

featured paper

Optimizing an OpenCL AI Kernel for the data center using Silexica’s SLX FPGA

Sponsored by Silexica

AI applications are increasingly contributing to FPGAs being used as co-processors in data centers. Silexica's newest application note shows how SLX FPGA accelerates an AI-related face detection design example, leveraging the bottom-up flow of Xilinx’s Vitis 2020.2 and Alveo U280 accelerator card.

Click to read

Featured Chalk Talk

Benefits of FPGAs & eFPGA IP in Futureproofing Compute Acceleration

Sponsored by Achronix

In the quest to accelerate and optimize today’s computing challenges such as AI inference, our system designs have to be flexible above all else. At the confluence of speed and flexibility are today’s new FPGAs and e-FPGA IP. In this episode of Chalk Talk, Amelia Dalton chats with Mike Fitton from Achronix about how to design systems to be both fast and future-proof using FPGA and e-FPGA technology.

Click here for more information about the Achronix Speedster7 FPGAs