feature article
Subscribe Now

Toward Intelligent Vision

Cadence Tensilica Vision P6

I’m told that the motivation for the iconic 1979 Saturday Night Live skit was a loosening of the US censor restrictions on broadcast television. For the first time, the word “hell” could be uttered on American TV. The story is that the Saturday Night Live writers wanted to celebrate the event by including the word “hell” as many times as possible in one skit.

Steve Martin stood staring off into the distance repeating: “What the hell is that thing?” and a crowd gradually gathers, all asking the same question.

For the rest of the skit, the characters continue staring at some unseen horizon, repeating variations of the phrase. “Hell” is uttered an almost uncountable number of times. The apparition is never identified. For some reason, it is hilarious.

Today, our embedded computing systems stare similarly off into the vast expanse of unknown visages. Clusters of pixels course through filters to be analyzed by neural networks. Billions of bits are flipped, and flipped again. Processors strain to extract some abstract meaning from the mounting mass of data, ever careful not to plunder their power budgets. 

The question burns: “What the hell IS that thing?”

If our vision systems are going to be able to answer the question with any degree of accuracy over a large range of applications – and do it within their ever-shrinking power budgets, we need processors tuned to the task. Intelligent vision is one of the most computationally challenging applications we’ve attempted with embedded systems, and plain, vanilla applications processors don’t have a prayer of hitting the computational power efficiency required to make vision a reality.

Tensilica’s processing architecture is designed specifically for customization. The idea is that an expert can use Tensilica tools to build the perfect processor for their application. For sophisticated teams designing special-purpose SoCs, Tensilica provides a means to raise their compute game significantly – custom-crafting purpose-built processors that squeeze every ounce of performance and power efficiency out of a given silicon area. 

At some point, the folks in Cadence’s Tensilica team realized that they themselves were the world’s biggest experts at customizing Tensilica processors, and that there were a few very common applications just crying out for a helping hand. Tensilica’s “Vision” processors are one of the results of that realization. If vision or neural networks are your thing, you don’t even have to bake your own architecture. Tensilica has already done it for you.

Last year, we looked over the then-impressive Vision P5 architecture, which brought an unparalleled level of computational performance and power efficiency to vision and related algorithms in SoCs. Well, move over P5 – Tensilica’s P6 is here, and it’s become the new benchmark for squeezing out the most neural computation per coulomb.

If your system is trying to recognize an object, chances are the first stage coming out of the camera is a set of filters that clean and prepare the image for analysis. Then, an algorithm looks over the scene and extracts candidate “regions” where there might be “things” of interest. Finally, a neural network is tasked with answering the question “What the hell IS that thing? What the… What the HELL? What the hell is THAT thing?”

Assuming a neural network is involved, there is typically a “training” component that is performed with the heavy-iron server-based hardware. The goal of the training step is to build the database that will later be used to recognize “things.” Then, our embedded system has to analyze the images that are thrown at it and come up with an identification based on the data set created during training. It is this second problem, the embedded task, where a hyper-efficient SoC processor is required.

Cadence says that the new P6 processor delivers up to 4x the performance of the previous (P5) generation. It is aimed directly at the neural-network-powered vision crowd. The “up to 4x” performance number comes primarily from the fact that the new machine has four times the number of hardware multiply-accumulate (MAC) units, giving a theoretical 4x the number of parallel multiplication operations. Vision P6 boasts 256 MACs, processing 9728 bits per cycle. This yields 4 vector operations per cycle, each with 64-way SIMD. The instruction set has been enhanced to take advantage of the additional capacity, and the processor does what Cadence calls “smart instruction slotting” to optimize performance.

P6 includes an optional 32-way SIMD vector FPU with 16-bit (FP16) precision. For a lot of vision-related tasks, 16-bit floating point is plenty, and toggling fewer bits is a key to burning less power during computation. It also makes for easy porting of code originally intended to run on GPUs. The company says that the processor actually delivers this 4x performance on a number of well-known imaging and vision benchmarks, while delivering considerably better power efficiency than the previous generation.

All of this architecture hums along at a brisk 1.1 GHz if implemented on the current 16nm FinFET CMOS process technology. The deeply pipelined design and low-power clock gating keep the energy appetite of that 1.1 GHz operation to a minimum.

Most vision applications are extremely demanding when it comes to memory, and P6 has a whopping 1024-bit-wide memory interface for pumping in piles of data at a time. The memory interface takes advantage of what the company calls “SuperGather” technology that improves the efficiency of memory access.

On the “how do I program it” side, Cadence provides libraries with over a thousand optimized CNN-, OpenVX- and OpenCV-based functions. They also offer kernels with high-performance Sobel, Median, and Gaussian filters; Convolution and RELU; SIFT, SURF, and Harris corner detection algorithms; HOG and HAAR object detection and classification; and LK optical flow algorithm.

Targeted applications include high-dynamic-range and wide-dynamic-range (HDR and WDR) image stabilization, face/people detection, face recognition, vehicle detection and, of course, the more generic “What the hell is that THING?” 

14 thoughts on “Toward Intelligent Vision”

  1. Pingback: GVK BIO
  2. Pingback: GVK BIO
  3. Pingback: Petplay
  4. Pingback: DMPK
  5. Pingback: Boliden
  6. Pingback: bandar judi bola
  7. Pingback: mold removal
  8. Pingback: ADME Services
  9. Pingback: cpnsnews.com

Leave a Reply

featured blogs
Mar 28, 2024
The difference between Olympic glory and missing out on the podium is often measured in mere fractions of a second, highlighting the pivotal role of timing in sports. But what's the chronometric secret to those photo finishes and record-breaking feats? In this comprehens...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

The Future of Intelligent Devices is Here
Sponsored by Alif Semiconductor
In this episode of Chalk Talk, Amelia Dalton and Henrik Flodell from Alif Semiconductor explore the what, where, and how of Alif’s Ensemble 32-bit microcontrollers and fusion processors. They examine the autonomous intelligent power management, high on-chip integration and isolated security subsystem aspects of these 32-bit microcontrollers and fusion processors, the role that scalability plays in this processor family, and how you can utilize them for your next embedded design.
Aug 9, 2023
27,635 views