feature article
Subscribe Now

The Steady March of Progress

ARM’s New Cortex-A75 is More of the Same, and That’s a Good Thing

“Those who cannot change their minds cannot change anything.” – George Bernard Shaw

To no one’s great surprise, ARM has released a new set of microprocessor cores.

You could almost set your watch by ARM’s upgrade announcements, so regular and predictable have they become. What’s this – about the umpty-fifth new processor to come out of the British-based, Japanese-owned company in about the last ten years? Do these guys ever take a day off?

ARM has more flavors of CPU than Crest has of toothpaste. Between the Cortex A-series, the low-end M-series, and the little-known R-series, ARM has, like General Motors, a processor “for every purse and purpose.” Alfred P. Sloan would be proud.

New up this month are the Cortex-A75 and Cortex-A55. The A75 is the more interesting of the two because it’s the bigger, faster, better-looking sibling. The A75 more or less replaces the Cortex-A72 and/or -A73 as ARM’s high-end mobile processor. It is not, however, the much-rumored server processor that’s expected later this year. That CPU, code-named Ares (Wonder Woman’s nemesis) will be even faster but won’t be very mobile-friendly.

The new A55 and A75 are the first two new cores in the DynamIQ generation (see March 29, 2017), and the first to implement the ARM v8.2a architecture specification. Well, most of it, anyway. Even these cores don’t execute quite all the new instructions that appear in the spec, although they are an upgrade from current ARM ISAs.

The A75 looks a whole lot like its predecessor, the A73. They have the same 11-stage pipeline, the same seven execution units, and the same three levels of caching. There’s only so much you can change in one year. But it’s the little tweaks that count.

Although the A73 and A75 share the same mix of execution resources, the A75 now has seven instruction queues, one for each unit, up from four on the A73. That should result in less stalling. The A75 also has an additional instruction decoder – three, instead of two – some tweaked branch-prediction logic, and it fetches four instructions per cycle (up from three). Overall, the A75 is less congested than its predecessor, even though they both run similar instructions on similar hardware. It’s not so much that the A75 is faster than the A73. It just slows down less often.

ARM says the A75 is about 20% faster on integer code, and 30% faster on FP, compared to the A73, all things being equal. That’s a nice speed bump for what is essentially a refreshed, rather than a wholly redesigned, CPU core. The A75’s clock rates should be the same as the A73’s, since the pipeline didn’t get any longer or appreciably more (or less) complex. The A75 obviously contains more logic than the A73, yet ARM says the power consumption is the same between the two. Credit more tweaking. On the other hand, if the A75 delivers better performance at the same clock speed and power consumption, it should be able to finish a given task 20% to 30% quicker, permitting an earlier shutdown. Thus, battery-powered applications may actually see a decrease in power. Or better performance for the same power – your choice.

Because the A75 and A55 are compatible with DynamIQ, instead of (or in addition to) big.little, they can theoretically be clustered with any other DynamIQ-compatible ARM processors, not just themselves. Right now, however, that set includes exactly no other processors except the A75 and A55. All new ARM cores from here onwards will presumably be DynamIQ-aware, but in the meantime, these two are it.

DynamIQ’s flexibility comes with a cost. On the plus side, it enables heterogenous mixes of processors – up to 256 of them, in fact. Those CPUs can run at different clock speeds and have very different processing capabilities. Once the selection of DynamIQ-aware CPUs expands beyond just these two, it should be possible to mix and match ARM cores in almost infinite varieties. Furthermore, DynamIQ-compatible CPUs like the A75 and A55 have private, rather than shared, L2 caches, which improve on-core performance a bit.

The downside is that performance across clusters may suffer by a small amount, as the L2 caches are now private. And, since DynamIQ permits mixing CPU clusters running at different speeds, there are necessarily asynchronous interfaces between those clusters. That allows breaking up the clock tree, and it permits the faster cores to run at full speed, but it also requires time-consuming resynchronization any time data travels between clusters. You don’t get something for nothing.

The A75 has acquired some high-end features that its predecessor didn’t have; ones likely borrowed from the still-in-design Ares project. It now supports ECC for its caches, more hypervisor hooks, finer-grained performance monitoring, and an interesting feature known as data poisoning. Normally, when a CPU fetches bad data (i.e., with a parity or ECC error), it throws a hardware fault and everything grinds to a stop while the system figures out what to do with the bad data. But relatively high-performance processors like the A75 frequently fetch data they don’t actually use. They might fetch instructions on the far side of a branch that won’t be executed, or they’ll fetch a long cache line but use only one byte of it. Why pull the fire alarm when the bad data isn’t causing a problem?

With data poisoning, the CPU marks the newly fetched data as bad (“poisoned”), but takes no further action until or unless that data is about to be used. Only then does it throw a fault, at which point the system can go through its usual panic phase. When implemented correctly, data poisoning can avoid unnecessary alarms.

For chip designers on the ARM upgrade treadmill, it’s hard not to like the Cortex-A75. All of the same, but more of it. More, better, faster. For those not using ARM’s processors, it’s getting harder to avoid them. And if you’re planning to buy a new phone in 2018, it’ll be pretty much impossible.

Leave a Reply

featured blogs
Apr 19, 2024
Data type conversion is a crucial aspect of programming that helps you handle data across different data types seamlessly. The SKILL language supports several data types, including integer and floating-point numbers, character strings, arrays, and a highly flexible linked lis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023
16,761 views