feature article
Subscribe Now

Let There Be Vision

Cadence Tensilica Vision P5 Lets the Light In

The Internet of People has cameras – literally billions of them. They are in smartphones, laptops, tablets, WiFi devices – it sometimes seems they’re watching our every move. This incredible volume of information is then (somewhat) intelligently analyzed, edited, and moderated by the vast visual computing power of the enormous array of human brains behind these cameras. The amount of computation required to filter, process, and interpret this image data is staggering. The end result is, of course, an almost infinite wasteland of cat videos on Facebook and YouTube. But video processing has higher purposes as well.

The Internet of Things is missing that oversized array of human brains to process, filter, and interpret its stream of image data. But the IoT needs “eyes” nonetheless. As we give our intelligent machines the power of vision, we face a daunting challenge in delivering the processing capacity required without blowing our budgets for power, price, and form factor. The analysis of all of this image data is perhaps the most compelling use case for heterogeneous computing and specialized processors.

We have written often about heterogeneous architectures like FPGAs paired with conventional application processors for vision applications. But what happens if you’re designing an SoC to take it to the next level – higher volume, lower cost, and superior performance? You won’t be fitting an FPGA/processor vision combo into your next smartphone design. You need something smaller, more efficient, and cheaper that can still deliver the performance required for advanced image processing and analysis.

Cadence’s Tensilica processors have long been a favorite for those wanting large amounts of DSP power in their SoC designs. By allowing us to customize our processor for the exact type of task we’re doing, Tensilica gives us the ability to have a highly optimized architecture that delivers vastly better performance, energy efficiency, and cost. General-purpose processors carry a lot of overhead for the wide range of applications they may be called upon to support. But with the ability to optimize the architecture for a narrowly defined application, we can make significant gains.

Even before being acquired by Cadence, Tensilica has been working quietly in the background of the industry. While everyone is aware of ARM’s omnipresence, few people know that over two billion Tensilica processors are also out there at work, often side-by-side with more general-purpose ARM architectures, doing the heavy lifting in a wide range of challenging applications like IoT, mobile, storage, and networking. 

Now Cadence has introduced a line of Tensilica processor IP optimized specifically for imaging and vision. The new Tensilica Vision P5 is designed to handle the entire camera-processing pipeline, and it involves a lot more that just an optimized processor architecture. The image-processing problem starts with common tasks like correcting for physical attributes of the camera – lens distortion, brightness falloff, color correction, sensor defects, and so forth. Once the image is cleaned up a bit, we need to add things like stabilization, high-dynamic range, 2D/3D noise reduction, and resolution enhancement. Finally, we get into the actual intelligent vision tasks – detecting people, faces, objects, gestures, and various types of motion and events.

Implemented in 16nm FinFET technology, the Vision P5 processor runs at 1.1GHz. It is deeply pipelined with advanced clock gating for energy efficiency. It boasts an ultra-wide 1024-bit memory interface with what the company calls “SuperGather” technology, which increases memory parallelism by 16x – reading and writing non-contiguous addresses in parallel. There are vector extensions that allow four vector operations per cycle, with 64-way SIMD – resulting in a possible 256 ALU operations per clock cycle. The vector extensions include 8, 16, and 32-bit data types with optional IEEE 32-bit vector floating point. These vector extensions can help significantly with GPU code porting. 

The optional vector floating point unit supports 32-bit IEEE 754, with 16-way single precision. It has a 16-entry 32-bit floating point register file and delivers 32 GFLOPS with a single 1GHz core. The tool suite provides easy migration from GPU versions of your code, and the unit is highly power-optimized to improve performance-per-watt for floating point ops.

On the software tool side, Vision P5 is supported by an auto-vectorizing compiler with OpenCV and OpenVx libraries that add over 800 optimized functions for vision and image processing. Cadence also partners with a number of companies to produce a robust ecosystem for Vision P5, including software and IP specifically targeting a number of application domains in areas like mobile, ADAS, Security, and IoT/Wearables. The bottom line is that a lot of the non-differentiating drudge work of intelligent vision is already done for you, so you can focus your energy on the interesting and exciting parts that will make your product special. 

How does it perform? Cadence claims that Vision P5 can deliver 13x the performance of IVP-EP with one-fifth the energy. That’s an enormous win in the performance-per-watt race – which is ultimately the key metric in most embedded-vision applications.

When it comes time to scale, you can implement a multi-processor version of Vision P5 to achieve some staggering performance numbers – up to one Tera-op in less than 2mm squared of silicon area. That should be enough oomph for just about any vision application you’d be working on today. The scalability of the Vision P5 solution is particularly attractive considering the wide range of potential end applications. You can dial in just enough capability to satisfy your system needs without a lot of extra baggage.

Despite being optimized for high-performance vision applications, Vision P5 leaves the flexibility where you need it. With cameras, sensors, and optics constantly changing and with vision algorithms advancing every day, you need the flexibility of software to allow your system to evolve with the times and the technology. The rich set of tools, IP, reference designs, and partner offerings that accompany Vision P5 allow you to have it both ways – rapid deployment and ultimate configurability. In the domain of custom SoC vision technology, we haven’t seen an offering that competes directly with Vision P5. It brings unique capabilities to the table. 

Leave a Reply

featured blogs
May 25, 2022
The Team RF "μWaveRiders" blog series is a showcase for Cadence AWR RF products. Monthly topics will vary between Cadence AWR Design Environment release highlights, feature videos, Cadence... ...
May 25, 2022
Explore the world of point-of-care (POC) anatomical 3D printing and learn how our AI-enabled Simpleware software eliminates manual segmentation & landmarking. The post How Synopsys Point-of-Care 3D Printing Helps Clinicians and Patients appeared first on From Silicon To...
May 25, 2022
There are so many cool STEM (science, technology, engineering, and math) toys available these days, and I want them all!...
May 24, 2022
By Neel Natekar Radio frequency (RF) circuitry is an essential component of many of the critical applications we now rely… ...

featured video

Synopsys PPA(V) Voltage Optimization

Sponsored by Synopsys

Performance-per-watt has emerged as one of the highest priorities in design quality, leading to a shift in technology focus and design power optimization methodologies. Variable operating voltage possess high potential in optimizing performance-per-watt results but requires a signoff accurate and efficient methodology to explore. Synopsys Fusion Design Platform™, uniquely built on a singular RTL-to-GDSII data model, delivers a full-flow voltage optimization and closure methodology to achieve the best performance-per-watt results for the most demanding semiconductor segments.

Learn More

featured paper

Intel Agilex FPGAs Deliver Game-Changing Flexibility & Agility for the Data-Centric World

Sponsored by Intel

The new Intel® Agilex™ FPGA is more than the latest programmable logic offering—it brings together revolutionary innovation in multiple areas of Intel technology leadership to create new opportunities to derive value and meaning from this transformation from edge to data center. Want to know more? Start with this white paper.

Click to read more

featured chalk talk

Seamless Ethernet to the Edge with 10BASE-T1L Technology

Sponsored by Mouser Electronics and Analog Devices

In order to keep up with the breakneck speed of today’s innovation in Industry 4.0, we need an efficient way to connect a wide variety of edge nodes to the cloud without breaks in our communication networks, and with shorter latency, lower power, and longer reach. In this episode of Chalk Talk, Amelia Dalton chats with Fiona Treacy from Analog Devices about the benefits of seamless ethernet and how seamless ethernet’s twisted single pair design, long reach and power and data over one cable can solve your industrial connectivity woes.

Click here for more information about Analog Devices Inc. ADIN1100 10BASE-T1L Ethernet PHY