The Internet of People has cameras – literally billions of them. They are in smartphones, laptops, tablets, WiFi devices – it sometimes seems they’re watching our every move. This incredible volume of information is then (somewhat) intelligently analyzed, edited, and moderated by the vast visual computing power of the enormous array of human brains behind these cameras. The amount of computation required to filter, process, and interpret this image data is staggering. The end result is, of course, an almost infinite wasteland of cat videos on Facebook and YouTube. But video processing has higher purposes as well.
The Internet of Things is missing that oversized array of human brains to process, filter, and interpret its stream of image data. But the IoT needs “eyes” nonetheless. As we give our intelligent machines the power of vision, we face a daunting challenge in delivering the processing capacity required without blowing our budgets for power, price, and form factor. The analysis of all of this image data is perhaps the most compelling use case for heterogeneous computing and specialized processors.
We have written often about heterogeneous architectures like FPGAs paired with conventional application processors for vision applications. But what happens if you’re designing an SoC to take it to the next level – higher volume, lower cost, and superior performance? You won’t be fitting an FPGA/processor vision combo into your next smartphone design. You need something smaller, more efficient, and cheaper that can still deliver the performance required for advanced image processing and analysis.
Cadence’s Tensilica processors have long been a favorite for those wanting large amounts of DSP power in their SoC designs. By allowing us to customize our processor for the exact type of task we’re doing, Tensilica gives us the ability to have a highly optimized architecture that delivers vastly better performance, energy efficiency, and cost. General-purpose processors carry a lot of overhead for the wide range of applications they may be called upon to support. But with the ability to optimize the architecture for a narrowly defined application, we can make significant gains.
Even before being acquired by Cadence, Tensilica has been working quietly in the background of the industry. While everyone is aware of ARM’s omnipresence, few people know that over two billion Tensilica processors are also out there at work, often side-by-side with more general-purpose ARM architectures, doing the heavy lifting in a wide range of challenging applications like IoT, mobile, storage, and networking.
Now Cadence has introduced a line of Tensilica processor IP optimized specifically for imaging and vision. The new Tensilica Vision P5 is designed to handle the entire camera-processing pipeline, and it involves a lot more that just an optimized processor architecture. The image-processing problem starts with common tasks like correcting for physical attributes of the camera – lens distortion, brightness falloff, color correction, sensor defects, and so forth. Once the image is cleaned up a bit, we need to add things like stabilization, high-dynamic range, 2D/3D noise reduction, and resolution enhancement. Finally, we get into the actual intelligent vision tasks – detecting people, faces, objects, gestures, and various types of motion and events.
Implemented in 16nm FinFET technology, the Vision P5 processor runs at 1.1GHz. It is deeply pipelined with advanced clock gating for energy efficiency. It boasts an ultra-wide 1024-bit memory interface with what the company calls “SuperGather” technology, which increases memory parallelism by 16x – reading and writing non-contiguous addresses in parallel. There are vector extensions that allow four vector operations per cycle, with 64-way SIMD – resulting in a possible 256 ALU operations per clock cycle. The vector extensions include 8, 16, and 32-bit data types with optional IEEE 32-bit vector floating point. These vector extensions can help significantly with GPU code porting.
The optional vector floating point unit supports 32-bit IEEE 754, with 16-way single precision. It has a 16-entry 32-bit floating point register file and delivers 32 GFLOPS with a single 1GHz core. The tool suite provides easy migration from GPU versions of your code, and the unit is highly power-optimized to improve performance-per-watt for floating point ops.
On the software tool side, Vision P5 is supported by an auto-vectorizing compiler with OpenCV and OpenVx libraries that add over 800 optimized functions for vision and image processing. Cadence also partners with a number of companies to produce a robust ecosystem for Vision P5, including software and IP specifically targeting a number of application domains in areas like mobile, ADAS, Security, and IoT/Wearables. The bottom line is that a lot of the non-differentiating drudge work of intelligent vision is already done for you, so you can focus your energy on the interesting and exciting parts that will make your product special.
How does it perform? Cadence claims that Vision P5 can deliver 13x the performance of IVP-EP with one-fifth the energy. That’s an enormous win in the performance-per-watt race – which is ultimately the key metric in most embedded-vision applications.
When it comes time to scale, you can implement a multi-processor version of Vision P5 to achieve some staggering performance numbers – up to one Tera-op in less than 2mm squared of silicon area. That should be enough oomph for just about any vision application you’d be working on today. The scalability of the Vision P5 solution is particularly attractive considering the wide range of potential end applications. You can dial in just enough capability to satisfy your system needs without a lot of extra baggage.
Despite being optimized for high-performance vision applications, Vision P5 leaves the flexibility where you need it. With cameras, sensors, and optics constantly changing and with vision algorithms advancing every day, you need the flexibility of software to allow your system to evolve with the times and the technology. The rich set of tools, IP, reference designs, and partner offerings that accompany Vision P5 allow you to have it both ways – rapid deployment and ultimate configurability. In the domain of custom SoC vision technology, we haven’t seen an offering that competes directly with Vision P5. It brings unique capabilities to the table.