feature article
Subscribe Now

Let There Be Vision

Cadence Tensilica Vision P5 Lets the Light In

The Internet of People has cameras – literally billions of them. They are in smartphones, laptops, tablets, WiFi devices – it sometimes seems they’re watching our every move. This incredible volume of information is then (somewhat) intelligently analyzed, edited, and moderated by the vast visual computing power of the enormous array of human brains behind these cameras. The amount of computation required to filter, process, and interpret this image data is staggering. The end result is, of course, an almost infinite wasteland of cat videos on Facebook and YouTube. But video processing has higher purposes as well.

The Internet of Things is missing that oversized array of human brains to process, filter, and interpret its stream of image data. But the IoT needs “eyes” nonetheless. As we give our intelligent machines the power of vision, we face a daunting challenge in delivering the processing capacity required without blowing our budgets for power, price, and form factor. The analysis of all of this image data is perhaps the most compelling use case for heterogeneous computing and specialized processors.

We have written often about heterogeneous architectures like FPGAs paired with conventional application processors for vision applications. But what happens if you’re designing an SoC to take it to the next level – higher volume, lower cost, and superior performance? You won’t be fitting an FPGA/processor vision combo into your next smartphone design. You need something smaller, more efficient, and cheaper that can still deliver the performance required for advanced image processing and analysis.

Cadence’s Tensilica processors have long been a favorite for those wanting large amounts of DSP power in their SoC designs. By allowing us to customize our processor for the exact type of task we’re doing, Tensilica gives us the ability to have a highly optimized architecture that delivers vastly better performance, energy efficiency, and cost. General-purpose processors carry a lot of overhead for the wide range of applications they may be called upon to support. But with the ability to optimize the architecture for a narrowly defined application, we can make significant gains.

Even before being acquired by Cadence, Tensilica has been working quietly in the background of the industry. While everyone is aware of ARM’s omnipresence, few people know that over two billion Tensilica processors are also out there at work, often side-by-side with more general-purpose ARM architectures, doing the heavy lifting in a wide range of challenging applications like IoT, mobile, storage, and networking. 

Now Cadence has introduced a line of Tensilica processor IP optimized specifically for imaging and vision. The new Tensilica Vision P5 is designed to handle the entire camera-processing pipeline, and it involves a lot more that just an optimized processor architecture. The image-processing problem starts with common tasks like correcting for physical attributes of the camera – lens distortion, brightness falloff, color correction, sensor defects, and so forth. Once the image is cleaned up a bit, we need to add things like stabilization, high-dynamic range, 2D/3D noise reduction, and resolution enhancement. Finally, we get into the actual intelligent vision tasks – detecting people, faces, objects, gestures, and various types of motion and events.

Implemented in 16nm FinFET technology, the Vision P5 processor runs at 1.1GHz. It is deeply pipelined with advanced clock gating for energy efficiency. It boasts an ultra-wide 1024-bit memory interface with what the company calls “SuperGather” technology, which increases memory parallelism by 16x – reading and writing non-contiguous addresses in parallel. There are vector extensions that allow four vector operations per cycle, with 64-way SIMD – resulting in a possible 256 ALU operations per clock cycle. The vector extensions include 8, 16, and 32-bit data types with optional IEEE 32-bit vector floating point. These vector extensions can help significantly with GPU code porting. 

The optional vector floating point unit supports 32-bit IEEE 754, with 16-way single precision. It has a 16-entry 32-bit floating point register file and delivers 32 GFLOPS with a single 1GHz core. The tool suite provides easy migration from GPU versions of your code, and the unit is highly power-optimized to improve performance-per-watt for floating point ops.

On the software tool side, Vision P5 is supported by an auto-vectorizing compiler with OpenCV and OpenVx libraries that add over 800 optimized functions for vision and image processing. Cadence also partners with a number of companies to produce a robust ecosystem for Vision P5, including software and IP specifically targeting a number of application domains in areas like mobile, ADAS, Security, and IoT/Wearables. The bottom line is that a lot of the non-differentiating drudge work of intelligent vision is already done for you, so you can focus your energy on the interesting and exciting parts that will make your product special. 

How does it perform? Cadence claims that Vision P5 can deliver 13x the performance of IVP-EP with one-fifth the energy. That’s an enormous win in the performance-per-watt race – which is ultimately the key metric in most embedded-vision applications.

When it comes time to scale, you can implement a multi-processor version of Vision P5 to achieve some staggering performance numbers – up to one Tera-op in less than 2mm squared of silicon area. That should be enough oomph for just about any vision application you’d be working on today. The scalability of the Vision P5 solution is particularly attractive considering the wide range of potential end applications. You can dial in just enough capability to satisfy your system needs without a lot of extra baggage.

Despite being optimized for high-performance vision applications, Vision P5 leaves the flexibility where you need it. With cameras, sensors, and optics constantly changing and with vision algorithms advancing every day, you need the flexibility of software to allow your system to evolve with the times and the technology. The rich set of tools, IP, reference designs, and partner offerings that accompany Vision P5 allow you to have it both ways – rapid deployment and ultimate configurability. In the domain of custom SoC vision technology, we haven’t seen an offering that competes directly with Vision P5. It brings unique capabilities to the table. 

Leave a Reply

featured blogs
Dec 7, 2021
We explain the fundamentals of photonics, challenges in photonics research & design, and photonics applications including communications & photonic computing. The post Harnessing the Power of Light: Photonics in IC Design appeared first on From Silicon To Software....
Dec 7, 2021
Optimization is all about meeting requirements. In the last post , you read about how you can use measurements to optimize a circuit. This post will discuss the use of curve fitting to optimize a... [[ Click on the title to access the full blog on the Cadence Community site....
Dec 6, 2021
The scary thing is that this reminds me of the scurrilous ways in which I've been treated by members of the programming and IT communities over the years....
Nov 8, 2021
Intel® FPGA Technology Day (IFTD) is a free four-day event that will be hosted virtually across the globe in North America, China, Japan, EMEA, and Asia Pacific from December 6-9, 2021. The theme of IFTD 2021 is 'Accelerating a Smart and Connected World.' This virtual event ...

featured video

Emulation and Prototyping to Accelerate Your Product Development Process

Sponsored by Cadence Design Systems

Validate your most sophisticated SoC designs before silicon and stay on schedule. Full system verification and early software development is possible with Cadence Palladium and Protium Dynamic Duo for IP/SoC verification, hardware and software regressions, full system verification, and early software development.

Click here for more information about Emulation and Prototyping from Cadence Design Systems

featured paper

Enabling Industry 4.0 with Low Power, Secure & Reliable Microcontrollers

Sponsored by Analog Devices

With the manufacturing and industrial markets becoming smarter and marching towards Industry 4.0, the newest Ultra Low Power & Secure Microcontrollers become an essential part of achieving the fastest, most secure, and highest reliability designs for these markets. Maxim's portfolio of MCUs are offered in a variety of low complexity packaging options, with the latest security features, Error Correcting Code on memories, and many other features which help designers make the most out of their products.

Click to read more

featured chalk talk

KISSLING Products: Rugged and Reliable Solutions

Sponsored by Mouser Electronics and TE Connectivity

Rugged and reliable designs today have a specific set of design requirements that may not be found in other industries including robustness, durability, and the ability to resist harsh environments. In this episode of Chalk Talk, Amelia Dalton chats with Mark Dickson from TE Connectivity about the KISSLING product family which includes a wide variety of rugged and reliable solutions for your next design.

Click here for more information about TE Connectivity / KISSLING Ruggedized Switching Products