feature article
Subscribe Now

Toward Intelligent Vision

Cadence Tensilica Vision P6

I’m told that the motivation for the iconic 1979 Saturday Night Live skit was a loosening of the US censor restrictions on broadcast television. For the first time, the word “hell” could be uttered on American TV. The story is that the Saturday Night Live writers wanted to celebrate the event by including the word “hell” as many times as possible in one skit.

Steve Martin stood staring off into the distance repeating: “What the hell is that thing?” and a crowd gradually gathers, all asking the same question.

For the rest of the skit, the characters continue staring at some unseen horizon, repeating variations of the phrase. “Hell” is uttered an almost uncountable number of times. The apparition is never identified. For some reason, it is hilarious.

Today, our embedded computing systems stare similarly off into the vast expanse of unknown visages. Clusters of pixels course through filters to be analyzed by neural networks. Billions of bits are flipped, and flipped again. Processors strain to extract some abstract meaning from the mounting mass of data, ever careful not to plunder their power budgets. 

The question burns: “What the hell IS that thing?”

If our vision systems are going to be able to answer the question with any degree of accuracy over a large range of applications – and do it within their ever-shrinking power budgets, we need processors tuned to the task. Intelligent vision is one of the most computationally challenging applications we’ve attempted with embedded systems, and plain, vanilla applications processors don’t have a prayer of hitting the computational power efficiency required to make vision a reality.

Tensilica’s processing architecture is designed specifically for customization. The idea is that an expert can use Tensilica tools to build the perfect processor for their application. For sophisticated teams designing special-purpose SoCs, Tensilica provides a means to raise their compute game significantly – custom-crafting purpose-built processors that squeeze every ounce of performance and power efficiency out of a given silicon area. 

At some point, the folks in Cadence’s Tensilica team realized that they themselves were the world’s biggest experts at customizing Tensilica processors, and that there were a few very common applications just crying out for a helping hand. Tensilica’s “Vision” processors are one of the results of that realization. If vision or neural networks are your thing, you don’t even have to bake your own architecture. Tensilica has already done it for you.

Last year, we looked over the then-impressive Vision P5 architecture, which brought an unparalleled level of computational performance and power efficiency to vision and related algorithms in SoCs. Well, move over P5 – Tensilica’s P6 is here, and it’s become the new benchmark for squeezing out the most neural computation per coulomb.

If your system is trying to recognize an object, chances are the first stage coming out of the camera is a set of filters that clean and prepare the image for analysis. Then, an algorithm looks over the scene and extracts candidate “regions” where there might be “things” of interest. Finally, a neural network is tasked with answering the question “What the hell IS that thing? What the… What the HELL? What the hell is THAT thing?”

Assuming a neural network is involved, there is typically a “training” component that is performed with the heavy-iron server-based hardware. The goal of the training step is to build the database that will later be used to recognize “things.” Then, our embedded system has to analyze the images that are thrown at it and come up with an identification based on the data set created during training. It is this second problem, the embedded task, where a hyper-efficient SoC processor is required.

Cadence says that the new P6 processor delivers up to 4x the performance of the previous (P5) generation. It is aimed directly at the neural-network-powered vision crowd. The “up to 4x” performance number comes primarily from the fact that the new machine has four times the number of hardware multiply-accumulate (MAC) units, giving a theoretical 4x the number of parallel multiplication operations. Vision P6 boasts 256 MACs, processing 9728 bits per cycle. This yields 4 vector operations per cycle, each with 64-way SIMD. The instruction set has been enhanced to take advantage of the additional capacity, and the processor does what Cadence calls “smart instruction slotting” to optimize performance.

P6 includes an optional 32-way SIMD vector FPU with 16-bit (FP16) precision. For a lot of vision-related tasks, 16-bit floating point is plenty, and toggling fewer bits is a key to burning less power during computation. It also makes for easy porting of code originally intended to run on GPUs. The company says that the processor actually delivers this 4x performance on a number of well-known imaging and vision benchmarks, while delivering considerably better power efficiency than the previous generation.

All of this architecture hums along at a brisk 1.1 GHz if implemented on the current 16nm FinFET CMOS process technology. The deeply pipelined design and low-power clock gating keep the energy appetite of that 1.1 GHz operation to a minimum.

Most vision applications are extremely demanding when it comes to memory, and P6 has a whopping 1024-bit-wide memory interface for pumping in piles of data at a time. The memory interface takes advantage of what the company calls “SuperGather” technology that improves the efficiency of memory access.

On the “how do I program it” side, Cadence provides libraries with over a thousand optimized CNN-, OpenVX- and OpenCV-based functions. They also offer kernels with high-performance Sobel, Median, and Gaussian filters; Convolution and RELU; SIFT, SURF, and Harris corner detection algorithms; HOG and HAAR object detection and classification; and LK optical flow algorithm.

Targeted applications include high-dynamic-range and wide-dynamic-range (HDR and WDR) image stabilization, face/people detection, face recognition, vehicle detection and, of course, the more generic “What the hell is that THING?” 

14 thoughts on “Toward Intelligent Vision”

  1. Pingback: GVK BIO
  2. Pingback: GVK BIO
  3. Pingback: Petplay
  4. Pingback: DMPK
  5. Pingback: Boliden
  6. Pingback: bandar judi bola
  7. Pingback: mold removal
  8. Pingback: ADME Services
  9. Pingback: cpnsnews.com

Leave a Reply

featured blogs
Jan 22, 2021
Amidst an ongoing worldwide pandemic, Samtec continues to connect with our communities. As a digital technology company, we understand the challenges and how uncertain times have been for everyone. In early 2020, Samtec Cares suspended its normal grant cycle and concentrated ...
Jan 22, 2021
I was recently introduced to the concept of a tray that quickly and easily attaches to your car'€™s steering wheel (not while you are driving, of course). What a good idea!...
Jan 22, 2021
This is my second post about this year's CES. The first was Consumer Electronics Show 2021: GM, Intel . AMD The second day of CES opened with Lisa Su, AMD's CEO, presenting. AMD announced new... [[ Click on the title to access the full blog on the Cadence Community...
Jan 20, 2021
Explore how EDA tools & proven IP accelerate the automotive design process and ensure compliance with Automotive Safety Integrity Levels & ISO requirements. The post How EDA Tools and IP Support Automotive Functional Safety Compliance appeared first on From Silicon...

featured paper

Overcoming Signal Integrity Challenges of 112G Connections on PCB

Sponsored by Cadence Design Systems

One big challenge with 112G SerDes is handling signal integrity (SI) issues. By the time the signal winds its way from the transmitter on one chip to packages, across traces on PCBs, through connectors or cables, and arrives at the receiver, the signal is very distorted, making it a challenge to recover the clock and data-bits of the information being transferred. Learn how to handle SI issues and ensure that data is faithfully transmitted with a very low bit error rate (BER).

Click here to download the whitepaper

Featured Chalk Talk

Electrification of the Vehicle

Sponsored by Mouser Electronics and KEMET

The automotive technology revolution has arrived, and with it - new demands on components for automotive applications. Electric vehicles, ADAS, connected cars, and autonomous driving put fresh demands on our electrical and electronic parts. In this episode of Chalk Talk, Amelia Dalton chats with Nick Stephen of KEMET about components for the next generation of automobiles.

More information about KEMET Electronics ALA7D & ALA8D Snap-In Capacitors