feature article
Subscribe Now

Arm Unwraps Three New CPU/GPU Designs

The Spirit of ’76 is Alive and Well in Cambridge, England

“Diversity: the art of thinking independently together.” – Malcolm Forbes

Japanese-owned Arm is celebrating the start of summer with three new ’76-themed IP cores: Cortex-A76, Mali-G76, and Mali-V76. It’s almost like they’re declaring independence from the competition.

You know the drill. New Arm cores are faster, more power-efficient, and occasionally even smaller than their predecessors. That’s mostly true in this case as well. The new Cortex-A76 processor is better in every way compared to the -A75; Mali-G76 is faster and smarter than the -G72; and Mali-V76 is ridiculously more proficient than the -V61 it replaces.

What hasn’t changed is Arm’s swagger. (The company now prefers to style its name in mixed case, rather than the previous all-caps acronym.) Every major Arm presentation is prefaced with a McDonald’s-like “billions and billions served” announcement. At last count, almost 130 billion Arm-based chips have seen the light of day, with over 21 billion of those made in just the previous calendar year. So far in 2018, Arm reckons its licensees have cranked out another 10 billion, or about half of last year’s total in a five-month span. At this rate, the headcount will top 200 billion sometime in 2021. The boardroom graphs are all pointing up and to the right.

That’s a lot of IP, and you don’t get there by putting just one CPU or one GPU in every cellphone. Nosirree, you’ve got to make IP cores for every purse and purpose, as Alfred P. Sloan used to say. Phones and other mobile devices now have multiple CPUs clustered together, and now they’re clustering multiple GPUs, as well. You’re just not au courant unless you’ve got five or six Arm-designed engines chugging away inside your pocket.

Cortex-A76 is the company’s new flagship microprocessor, and Arm says it’s got double the performance of “currently shipping smartphones” that are based on the Cortex-A73 (not the faster -A75). Compared to its closer sibling, the -A75, the -A76 is expected to crank out about 35% more benchmarks. Much of that improvement is due to architectural tweaks that the company declined to describe. The rest is down to process technology. Arm bases its claims on an -A76 running at 3.0 GHz in 7nm technology versus an -A75 in 10nm at 2.8 GHz, so about 10% of that uplift is due to clock speed, not architecture. Still, any design that improves performance by twenty-some percent is a significant upgrade. Arm calls it “laptop performance in a cellphone power envelope.”

The new -A76 still adheres to the current-generation Armv8-A architecture specification, so there are no software-visible changes to the CPU. It’s also compatible with the company’s newish DynamIQ internal bus interface, so you’re allowed to cluster multiple CPU cores together. In fact, it’s expected.

Helping you to up the core count in your next SoC is the Mali-G76 graphics processor. It’s an upgrade from the -G72, obviously, with about the same bump in performance as the new Cortex. The new -G76 ought to be 30% faster than a -G72, and/or 30% more power-efficient, assuming both cores are running at the same speed in the same process technology. Here again, the changes are microarchitectural, with no outward alterations to the programmer’s model.

Inside, the -G76 has anywhere from four to 20 shader units (your choice), with three execution engines in each. Those engines are themselves upgraded from those found in the -G72, with texture mapping especially improved. Although the respectable 30% bump in performance is nice, it’s the core’s machine-learning (ML) prowess that really takes a leap. Arm says the -G76 is 2.7 times faster than the -G72, due largely to new hardware support for 8-bit integer dot products.

The third point in Arm’s celebratory three-cornered hat is Mali-V76. It’s your way to help Arm achieve its 200-billion-unit goal. Designed for video, as opposed to graphics, the -V76 is a screamer streamer. The target applications here are AR/VR goggles, high-end TVs, and video walls displaying multiple independent streams. The -V76 is Arm’s first core able to decode 8K UHD content, but it can also be configured to show a 2×2 array of 2160-pixel streams at 60 fps, or a 4×4 array of 1080p video at 60 fps. The latter presentation appeals to makers of video kiosks and high-density information displays – or to television addicts with short attention spans.

Because the -V76 can stream both ways and encode as well as decode (though the capabilities aren’t quite symmetrical), it’s also applicable to AR goggles that must source, as well as sink, video data.

All three new 76-themed cores are available now, in the sense that Arm will happily take your money and license the IP to you. Indeed, a few unnamed licensees are already so equipped, and have even produced silicon, which means they’ve got a one-year head start. End-user products containing one or more of these new IP cores are expected “sometime in 2019,” according to Arm.

Arm is the undisputed master of the universe when it comes to licensed CPU cores; somewhat less so in the GPU arena, where it competes with PowerVR, Vivante, and other options. Part of Arm’s attraction is its huge portfolio of options, including low-end IP to round out your SoC design. And part of it is Arm’s huge size and financial stability, when other IP vendors seem to be struggling to stay afloat. And, now that Arm has CPUs, GPUs, and video processors spread all over the performance spectrum, it can preassemble complex subsystems for you and license that, too. Instant product: just add software. The Anglo-Japanese company is in a good position with a bright future. What better reason to light off a few fireworks?

4 thoughts on “Arm Unwraps Three New CPU/GPU Designs”

  1. I came for the segmented cache bus concurrency upgrade and the smoked tea, and I’m all out of smoked…data directives posing as arbitrary operators, maybe? Is there not an inset image of the likely IP changes over the variant universe and a fab roadmap detail just yet; or is the plan that Qualcomm ships samples for a ‘done one better’ banner, then the Arm strategy people can pick the other way to compile chips and cue llvm?

Leave a Reply

featured blogs
Sep 30, 2022
When I wrote my book 'Bebop to the Boolean Boogie,' it was certainly not my intention to lead 6-year-old boys astray....
Sep 30, 2022
Wow, September has flown by. It's already the last Friday of the month, the last day of the month in fact, and so time for a monthly update. Kaufman Award The 2022 Kaufman Award honors Giovanni (Nanni) De Micheli of École Polytechnique Fédérale de Lausanne...
Sep 29, 2022
We explain how silicon photonics uses CMOS manufacturing to create photonic integrated circuits (PICs), solid state LiDAR sensors, integrated lasers, and more. The post What You Need to Know About Silicon Photonics appeared first on From Silicon To Software....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Powering Servers and AI with Ultra-Efficient IPOL Voltage Regulators

Sponsored by Infineon

For today’s networking, telecom, server, and enterprise storage applications, power efficiency and power density are crucial components to the success of their power management. In this episode of Chalk Talk, Amelia Dalton and Dr. Davood Yazdani from Infineon chat about the details of Infineon’s ultra-efficient integrated point of load voltage regulators. Davood and Amelia take a closer look at the operation of these integrated point of load voltage regulators and why using the Infineon OptiMOS 5 FETs combined with the Infineon Fast Constant On Time controller engine make them a great solution for your next design.

Click here for more information about Integrated POL Voltage Regulators