feature article
Subscribe Now

PowerVR AX2185 Accelerates Neural Nets

New IP Cores Aimed at Smartphones, Cameras, Consumer Goods

“The greatest danger of AI is that people conclude too early that they understand it.” — Eliezer Yudkowsky

Sometimes accidental discoveries are the best ones. Teflon was supposed to be a refrigerant. The first heart pacemaker was designed as a measuring device, but inventor Wilson Greatbatch put in the wrong resistor value. And Play-Doh was created to clean wallpaper.

Turns out, the graphics card in your PC is surprisingly good – almost accidentally talented – at neural-net processing, cryptocurrency mining, machine learning, and artificial intelligence. Who knew? With a few tweaks, your GPU can make a darned good robot brain, even outsmarting the “real” microprocessor in your system.

This bit of serendipity hasn’t gone unnoticed by the world’s GPU designers, of course. Never ones to let moss grow under their feet, companies like nVidia, AMD/ATI, and Imagination Technologies have rapidly pivoted their GPU architectures to capitalize on these new and interesting markets. Now you can buy GPUs to use as, well, GPUs, or you can buy them for completely different purposes.

This week’s announcement comes from Imagination, the keepers of the PowerVR flame. They’ve split their popular GPU family into two completely different architectures, one for traditional graphics tasks and another for accelerating neural networks. They both share the PowerVR brand name, but that’s about all they have in common.  

The first broad outlines of this came last year, when Imagination announced its PowerVR 2NX architecture. So, we knew that Imagination had an accelerator in the works. Now we know their names and what they look like.

Say hello to AX2145 and AX2185. They’re the first two instantiations of the new 2NX architecture, and they’re pretty similar. One’s designed for maximum performance (the ’85), while the other is a milder, more “balanced” design, according to the company.

Both IP cores are available immediately, and both are, in fact, already being used by a pair of lead customers. Expect to see the first AX21x5-based products on the street in about a year, probably in the form of high-end Chinese smartphones, security cameras, or drones.

Broadly speaking, what separates the AX21xx twins from other AI-focused designs from the major semiconductor vendors is power efficiency. Your nVidia GPU is never going to last long inside a cellphone, so designing for ultimate performance isn’t the goal. Instead, Imagination has to bear in mind that its customers are running on batteries, in confined spaces, and with no good way to cool the hardware. They’re focused on the Internet of Surveillance™, not Call of Duty II.

The AX2185 is the faster of the two designs, and evidently the fastest implementation on the PowerVR roadmap. Imagination says it can perform at 4.1 TOPS (trillions of operations per second), which neatly matches the top end of the family’s performance range when it was announced last year. If that kind of acceleration somehow isn’t good enough for you, you need multiple AX2185s.

The AX2145 is the little sister, with 1.0 TOPS performance, smaller die area, and less power consumption. Imagination sees this as a good fit for midrange smartphones (midrange in a few years, perhaps), digital TVs, and set-top boxes.

Weirdly, Imagination claims that the ’45 actually outperforms the ’85 in certain circumstances. Specifically, when memory bandwidth is tight, you’ll want to use the ’45, not its bigger sibling. That’s because both designs – like all neural-net accelerators – need a boatload of bandwidth to operate efficiently. Like GPUs and DSPs, NNAs are memory hogs, and throttling that memory can have a big effect on the engine’s efficiency. Imagination spent a lot of time benchmarking, and later explaining, why this situation is so.

It also explains why the whole 2NX architecture supports funny bit widths. As our own Bryon Moyer explained back in October, the new PowerVR family is fixed-point only, and supports 8-bit and 4-bit integers, as well as some nontraditional bit sizes, like 5-bit format. Add to that 12-bit, 7-bit, 6-bit, and other integers and you begin to see how hard Imagination worked to preserve memory size and bandwidth.

You can even tweak the bit depth on a layer-by-layer basis, adding extra precision where it’s needed and discarding it where it isn’t. You can also use different formats for weights and for data. It’s all flexible.

In bandwidth-constrained applications, the high-end AX2185 chokes if it’s not fed fast enough, wasting most of that potential performance. This is where the AX2145 outpaces it by as much as 50%, according to the company’s benchmarks.

How bad does your bandwidth have to be for this performance inversion to take place? YMMV, but Imagination hints that a system with bandwidth in the “low single digits” of Gbytes/sec would favor the smaller ’45 over the larger ’85 variant. Conversely, if you can provide “dozens of gigabytes per second” of bandwidth, you’ll be happier with the AX2185.

NNAs like these allow designers to stick more intelligence into end nodes, like security cameras that do their own object recognition. That’s great if you’re a camera designer, because you can charge more for your “smart” camera compared to the dumb ones you sold last year. It’s also good news for the overall system, because now you’re not piping full-resolution, full-rate video down an Ethernet cable to a waiting computer, which then has to analyze all those pixels in real-time on its Intel, nVidia, or AMD processor (a task for which they are ill-suited, I can tell you). The whole system gets smarter, network bandwidth is reduced drastically, power consumption probably goes down, and the chance of someone intercepting your raw video stream pretty much disappears. Everybody wins.

NNAs are like the DSPs of the 1990s: everybody needs one but nobody’s sure how to program them. Every few years, a new DSP application would materialize that promised to catapult DSP chips and software into the mainstream. Modems! Voice recognition! Graphics! Machine vision! Each time, widespread adoption proved elusive and DSPs remained a niche product, ideally suited to frustratingly narrow application areas.

At first, GPUs looked set to follow the same path. Way too many GPU startups tried and failed to break into the mainstream. Only a few, including PowerVR, nVidia, and ATI (now AMD) survived the initial wave of optimism. Like restaurants, GPU companies tend to fail within the first 18–24 months.

Machine learning, artificial intelligence, and convolutional neural networks fell on the GPU industry like manna from heaven. Suddenly, a whole new application area dropped in their laps, ready-made, and nicely suited to existing chips. It’s like discovering that you can sell your floor wax as a dessert topping, too. But that doesn’t mean you can’t improve on the flavor, and so now the second generation of NNAs is emerging from their GPU progenitors. Existing GPUs were handy, plentiful, and affordable, but not quite perfect. Neural nets may have come as a surprise to the industry, but today’s NNA designers are wasting no time in capitalizing on the opportunity.

Leave a Reply

featured blogs
Nov 30, 2023
No one wants to waste unnecessary time in the model creation phase when using a modeling software. Rather than expect users to spend time trawling for published data and tediously model equipment items one by one from scratch, modeling software tends to include pre-configured...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines.

Read more

featured chalk talk

E-Mobility - Charging Stations & Wallboxes AC or DC Charging?
In this episode of Chalk Talk, Amelia Dalton and Andreas Nadler from Würth Elektronik investigate e-mobility charging stations and wallboxes. We take a closer look at the benefits, components, and functions of AC and DC wallboxes and charging stations. They also examine the role that DC link capacitors play in power conversion and how Würth Elektronik can help you create your next AC and DC wallbox or charging station design.
Jul 12, 2023
17,117 views