feature article
Subscribe Now

Toward Intelligent Vision

Cadence Tensilica Vision P6

I’m told that the motivation for the iconic 1979 Saturday Night Live skit was a loosening of the US censor restrictions on broadcast television. For the first time, the word “hell” could be uttered on American TV. The story is that the Saturday Night Live writers wanted to celebrate the event by including the word “hell” as many times as possible in one skit.

Steve Martin stood staring off into the distance repeating: “What the hell is that thing?” and a crowd gradually gathers, all asking the same question.

For the rest of the skit, the characters continue staring at some unseen horizon, repeating variations of the phrase. “Hell” is uttered an almost uncountable number of times. The apparition is never identified. For some reason, it is hilarious.

Today, our embedded computing systems stare similarly off into the vast expanse of unknown visages. Clusters of pixels course through filters to be analyzed by neural networks. Billions of bits are flipped, and flipped again. Processors strain to extract some abstract meaning from the mounting mass of data, ever careful not to plunder their power budgets. 

The question burns: “What the hell IS that thing?”

If our vision systems are going to be able to answer the question with any degree of accuracy over a large range of applications – and do it within their ever-shrinking power budgets, we need processors tuned to the task. Intelligent vision is one of the most computationally challenging applications we’ve attempted with embedded systems, and plain, vanilla applications processors don’t have a prayer of hitting the computational power efficiency required to make vision a reality.

Tensilica’s processing architecture is designed specifically for customization. The idea is that an expert can use Tensilica tools to build the perfect processor for their application. For sophisticated teams designing special-purpose SoCs, Tensilica provides a means to raise their compute game significantly – custom-crafting purpose-built processors that squeeze every ounce of performance and power efficiency out of a given silicon area. 

At some point, the folks in Cadence’s Tensilica team realized that they themselves were the world’s biggest experts at customizing Tensilica processors, and that there were a few very common applications just crying out for a helping hand. Tensilica’s “Vision” processors are one of the results of that realization. If vision or neural networks are your thing, you don’t even have to bake your own architecture. Tensilica has already done it for you.

Last year, we looked over the then-impressive Vision P5 architecture, which brought an unparalleled level of computational performance and power efficiency to vision and related algorithms in SoCs. Well, move over P5 – Tensilica’s P6 is here, and it’s become the new benchmark for squeezing out the most neural computation per coulomb.

If your system is trying to recognize an object, chances are the first stage coming out of the camera is a set of filters that clean and prepare the image for analysis. Then, an algorithm looks over the scene and extracts candidate “regions” where there might be “things” of interest. Finally, a neural network is tasked with answering the question “What the hell IS that thing? What the… What the HELL? What the hell is THAT thing?”

Assuming a neural network is involved, there is typically a “training” component that is performed with the heavy-iron server-based hardware. The goal of the training step is to build the database that will later be used to recognize “things.” Then, our embedded system has to analyze the images that are thrown at it and come up with an identification based on the data set created during training. It is this second problem, the embedded task, where a hyper-efficient SoC processor is required.

Cadence says that the new P6 processor delivers up to 4x the performance of the previous (P5) generation. It is aimed directly at the neural-network-powered vision crowd. The “up to 4x” performance number comes primarily from the fact that the new machine has four times the number of hardware multiply-accumulate (MAC) units, giving a theoretical 4x the number of parallel multiplication operations. Vision P6 boasts 256 MACs, processing 9728 bits per cycle. This yields 4 vector operations per cycle, each with 64-way SIMD. The instruction set has been enhanced to take advantage of the additional capacity, and the processor does what Cadence calls “smart instruction slotting” to optimize performance.

P6 includes an optional 32-way SIMD vector FPU with 16-bit (FP16) precision. For a lot of vision-related tasks, 16-bit floating point is plenty, and toggling fewer bits is a key to burning less power during computation. It also makes for easy porting of code originally intended to run on GPUs. The company says that the processor actually delivers this 4x performance on a number of well-known imaging and vision benchmarks, while delivering considerably better power efficiency than the previous generation.

All of this architecture hums along at a brisk 1.1 GHz if implemented on the current 16nm FinFET CMOS process technology. The deeply pipelined design and low-power clock gating keep the energy appetite of that 1.1 GHz operation to a minimum.

Most vision applications are extremely demanding when it comes to memory, and P6 has a whopping 1024-bit-wide memory interface for pumping in piles of data at a time. The memory interface takes advantage of what the company calls “SuperGather” technology that improves the efficiency of memory access.

On the “how do I program it” side, Cadence provides libraries with over a thousand optimized CNN-, OpenVX- and OpenCV-based functions. They also offer kernels with high-performance Sobel, Median, and Gaussian filters; Convolution and RELU; SIFT, SURF, and Harris corner detection algorithms; HOG and HAAR object detection and classification; and LK optical flow algorithm.

Targeted applications include high-dynamic-range and wide-dynamic-range (HDR and WDR) image stabilization, face/people detection, face recognition, vehicle detection and, of course, the more generic “What the hell is that THING?” 

14 thoughts on “Toward Intelligent Vision”

  1. Pingback: GVK BIO
  2. Pingback: GVK BIO
  3. Pingback: Petplay
  4. Pingback: DMPK
  5. Pingback: Boliden
  6. Pingback: bandar judi bola
  7. Pingback: mold removal
  8. Pingback: ADME Services
  9. Pingback: cpnsnews.com

Leave a Reply

featured blogs
May 24, 2022
By Melika Roshandell Today's modern electronic designs require ever more functionality and performance to meet consumer demand. These requirements make scaling traditional, flat, 2D-ICs very... ...
May 24, 2022
Nicholas Temese, who hails from Quebec, Canada, creates highly detailed handcrafted miniature scale models of classic computers from yesteryear....
May 24, 2022
By Neel Natekar Radio frequency (RF) circuitry is an essential component of many of the critical applications we now rely… ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....

featured video

Synopsys PPA(V) Voltage Optimization

Sponsored by Synopsys

Performance-per-watt has emerged as one of the highest priorities in design quality, leading to a shift in technology focus and design power optimization methodologies. Variable operating voltage possess high potential in optimizing performance-per-watt results but requires a signoff accurate and efficient methodology to explore. Synopsys Fusion Design Platform™, uniquely built on a singular RTL-to-GDSII data model, delivers a full-flow voltage optimization and closure methodology to achieve the best performance-per-watt results for the most demanding semiconductor segments.

Learn More

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Seamless Ethernet to the Edge with 10BASE-T1L Technology

Sponsored by Mouser Electronics and Analog Devices

In order to keep up with the breakneck speed of today’s innovation in Industry 4.0, we need an efficient way to connect a wide variety of edge nodes to the cloud without breaks in our communication networks, and with shorter latency, lower power, and longer reach. In this episode of Chalk Talk, Amelia Dalton chats with Fiona Treacy from Analog Devices about the benefits of seamless ethernet and how seamless ethernet’s twisted single pair design, long reach and power and data over one cable can solve your industrial connectivity woes.

Click here for more information about Analog Devices Inc. ADIN1100 10BASE-T1L Ethernet PHY