feature article
Subscribe Now

Toward Intelligent Vision

Cadence Tensilica Vision P6

I’m told that the motivation for the iconic 1979 Saturday Night Live skit was a loosening of the US censor restrictions on broadcast television. For the first time, the word “hell” could be uttered on American TV. The story is that the Saturday Night Live writers wanted to celebrate the event by including the word “hell” as many times as possible in one skit.

Steve Martin stood staring off into the distance repeating: “What the hell is that thing?” and a crowd gradually gathers, all asking the same question.

For the rest of the skit, the characters continue staring at some unseen horizon, repeating variations of the phrase. “Hell” is uttered an almost uncountable number of times. The apparition is never identified. For some reason, it is hilarious.

Today, our embedded computing systems stare similarly off into the vast expanse of unknown visages. Clusters of pixels course through filters to be analyzed by neural networks. Billions of bits are flipped, and flipped again. Processors strain to extract some abstract meaning from the mounting mass of data, ever careful not to plunder their power budgets. 

The question burns: “What the hell IS that thing?”

If our vision systems are going to be able to answer the question with any degree of accuracy over a large range of applications – and do it within their ever-shrinking power budgets, we need processors tuned to the task. Intelligent vision is one of the most computationally challenging applications we’ve attempted with embedded systems, and plain, vanilla applications processors don’t have a prayer of hitting the computational power efficiency required to make vision a reality.

Tensilica’s processing architecture is designed specifically for customization. The idea is that an expert can use Tensilica tools to build the perfect processor for their application. For sophisticated teams designing special-purpose SoCs, Tensilica provides a means to raise their compute game significantly – custom-crafting purpose-built processors that squeeze every ounce of performance and power efficiency out of a given silicon area. 

At some point, the folks in Cadence’s Tensilica team realized that they themselves were the world’s biggest experts at customizing Tensilica processors, and that there were a few very common applications just crying out for a helping hand. Tensilica’s “Vision” processors are one of the results of that realization. If vision or neural networks are your thing, you don’t even have to bake your own architecture. Tensilica has already done it for you.

Last year, we looked over the then-impressive Vision P5 architecture, which brought an unparalleled level of computational performance and power efficiency to vision and related algorithms in SoCs. Well, move over P5 – Tensilica’s P6 is here, and it’s become the new benchmark for squeezing out the most neural computation per coulomb.

If your system is trying to recognize an object, chances are the first stage coming out of the camera is a set of filters that clean and prepare the image for analysis. Then, an algorithm looks over the scene and extracts candidate “regions” where there might be “things” of interest. Finally, a neural network is tasked with answering the question “What the hell IS that thing? What the… What the HELL? What the hell is THAT thing?”

Assuming a neural network is involved, there is typically a “training” component that is performed with the heavy-iron server-based hardware. The goal of the training step is to build the database that will later be used to recognize “things.” Then, our embedded system has to analyze the images that are thrown at it and come up with an identification based on the data set created during training. It is this second problem, the embedded task, where a hyper-efficient SoC processor is required.

Cadence says that the new P6 processor delivers up to 4x the performance of the previous (P5) generation. It is aimed directly at the neural-network-powered vision crowd. The “up to 4x” performance number comes primarily from the fact that the new machine has four times the number of hardware multiply-accumulate (MAC) units, giving a theoretical 4x the number of parallel multiplication operations. Vision P6 boasts 256 MACs, processing 9728 bits per cycle. This yields 4 vector operations per cycle, each with 64-way SIMD. The instruction set has been enhanced to take advantage of the additional capacity, and the processor does what Cadence calls “smart instruction slotting” to optimize performance.

P6 includes an optional 32-way SIMD vector FPU with 16-bit (FP16) precision. For a lot of vision-related tasks, 16-bit floating point is plenty, and toggling fewer bits is a key to burning less power during computation. It also makes for easy porting of code originally intended to run on GPUs. The company says that the processor actually delivers this 4x performance on a number of well-known imaging and vision benchmarks, while delivering considerably better power efficiency than the previous generation.

All of this architecture hums along at a brisk 1.1 GHz if implemented on the current 16nm FinFET CMOS process technology. The deeply pipelined design and low-power clock gating keep the energy appetite of that 1.1 GHz operation to a minimum.

Most vision applications are extremely demanding when it comes to memory, and P6 has a whopping 1024-bit-wide memory interface for pumping in piles of data at a time. The memory interface takes advantage of what the company calls “SuperGather” technology that improves the efficiency of memory access.

On the “how do I program it” side, Cadence provides libraries with over a thousand optimized CNN-, OpenVX- and OpenCV-based functions. They also offer kernels with high-performance Sobel, Median, and Gaussian filters; Convolution and RELU; SIFT, SURF, and Harris corner detection algorithms; HOG and HAAR object detection and classification; and LK optical flow algorithm.

Targeted applications include high-dynamic-range and wide-dynamic-range (HDR and WDR) image stabilization, face/people detection, face recognition, vehicle detection and, of course, the more generic “What the hell is that THING?” 

14 thoughts on “Toward Intelligent Vision”

  1. Pingback: GVK BIO
  2. Pingback: GVK BIO
  3. Pingback: Petplay
  4. Pingback: DMPK
  5. Pingback: Boliden
  6. Pingback: bandar judi bola
  7. Pingback: mold removal
  8. Pingback: ADME Services
  9. Pingback: cpnsnews.com

Leave a Reply

featured blogs
Oct 23, 2020
Processing a component onto a PCB used to be fairly straightforward. Through hole products, a single or double row surface mount with a larger center-line rarely offer unique challenges obtaining a proper solder joint. However, as electronics continue to get smaller and conne...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...
Oct 23, 2020
Any suggestions for a 4x4 keypad in which the keys aren'€™t wobbly and you don'€™t have to strike a key dead center for it to make contact?...
Oct 23, 2020
At 11:10am Korean time this morning, Cadence's Elias Fallon delivered one of the keynotes at ISOCC (International System On Chip Conference). It was titled EDA and Machine Learning: The Next Leap... [[ Click on the title to access the full blog on the Cadence Community ...

featured video

Better PPA with Innovus Mixed Placer Technology – Gigaplace XL

Sponsored by Cadence Design Systems

With the increase of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan with manual methods. Innovus Implementation’s advanced multi-objective placement technology, GigaPlace XL, provides automation to optimize at scale, concurrent placement of macros, and standard cells for multiple objectives like timing, wirelength, congestion, and power. This technology provides an innovative way to address design productivity along with design quality improvements reducing weeks of manual floorplan time down to a few hours.

Click here for more information about Innovus Implementation System

featured paper

Fundamentals of Precision ADC Noise Analysis

Sponsored by Texas Instruments

Build your knowledge of noise performance with high-resolution delta-sigma ADCs. This e-book covers types of ADC noise, how other components contribute noise to the system, and how these noise sources interact with each other.

Click here to download the whitepaper

Featured Chalk Talk

Maxim's Himalaya uSLIC Portfolio

Sponsored by Mouser Electronics and Maxim Integrated

With form factors continuing to shrink, most engineers are working hard to reduce the number of discrete components in their designs. Power supplies, in particular, are problematic - often requiring a number of large components. In this episode of Chalk Talk, Amelia Dalton chats with John Woodward of Maxim Integrated about how power modules can save board space, improve performance, and help reliability.

Click here for more information about Maxim Integrated Himalaya uSLIC™ MAXM1546x Step-Down Power Modules