feature article
Subscribe Now

The Steady March of Progress

ARM’s New Cortex-A75 is More of the Same, and That’s a Good Thing

“Those who cannot change their minds cannot change anything.” – George Bernard Shaw

To no one’s great surprise, ARM has released a new set of microprocessor cores.

You could almost set your watch by ARM’s upgrade announcements, so regular and predictable have they become. What’s this – about the umpty-fifth new processor to come out of the British-based, Japanese-owned company in about the last ten years? Do these guys ever take a day off?

ARM has more flavors of CPU than Crest has of toothpaste. Between the Cortex A-series, the low-end M-series, and the little-known R-series, ARM has, like General Motors, a processor “for every purse and purpose.” Alfred P. Sloan would be proud.

New up this month are the Cortex-A75 and Cortex-A55. The A75 is the more interesting of the two because it’s the bigger, faster, better-looking sibling. The A75 more or less replaces the Cortex-A72 and/or -A73 as ARM’s high-end mobile processor. It is not, however, the much-rumored server processor that’s expected later this year. That CPU, code-named Ares (Wonder Woman’s nemesis) will be even faster but won’t be very mobile-friendly.

The new A55 and A75 are the first two new cores in the DynamIQ generation (see March 29, 2017), and the first to implement the ARM v8.2a architecture specification. Well, most of it, anyway. Even these cores don’t execute quite all the new instructions that appear in the spec, although they are an upgrade from current ARM ISAs.

The A75 looks a whole lot like its predecessor, the A73. They have the same 11-stage pipeline, the same seven execution units, and the same three levels of caching. There’s only so much you can change in one year. But it’s the little tweaks that count.

Although the A73 and A75 share the same mix of execution resources, the A75 now has seven instruction queues, one for each unit, up from four on the A73. That should result in less stalling. The A75 also has an additional instruction decoder – three, instead of two – some tweaked branch-prediction logic, and it fetches four instructions per cycle (up from three). Overall, the A75 is less congested than its predecessor, even though they both run similar instructions on similar hardware. It’s not so much that the A75 is faster than the A73. It just slows down less often.

ARM says the A75 is about 20% faster on integer code, and 30% faster on FP, compared to the A73, all things being equal. That’s a nice speed bump for what is essentially a refreshed, rather than a wholly redesigned, CPU core. The A75’s clock rates should be the same as the A73’s, since the pipeline didn’t get any longer or appreciably more (or less) complex. The A75 obviously contains more logic than the A73, yet ARM says the power consumption is the same between the two. Credit more tweaking. On the other hand, if the A75 delivers better performance at the same clock speed and power consumption, it should be able to finish a given task 20% to 30% quicker, permitting an earlier shutdown. Thus, battery-powered applications may actually see a decrease in power. Or better performance for the same power – your choice.

Because the A75 and A55 are compatible with DynamIQ, instead of (or in addition to) big.little, they can theoretically be clustered with any other DynamIQ-compatible ARM processors, not just themselves. Right now, however, that set includes exactly no other processors except the A75 and A55. All new ARM cores from here onwards will presumably be DynamIQ-aware, but in the meantime, these two are it.

DynamIQ’s flexibility comes with a cost. On the plus side, it enables heterogenous mixes of processors – up to 256 of them, in fact. Those CPUs can run at different clock speeds and have very different processing capabilities. Once the selection of DynamIQ-aware CPUs expands beyond just these two, it should be possible to mix and match ARM cores in almost infinite varieties. Furthermore, DynamIQ-compatible CPUs like the A75 and A55 have private, rather than shared, L2 caches, which improve on-core performance a bit.

The downside is that performance across clusters may suffer by a small amount, as the L2 caches are now private. And, since DynamIQ permits mixing CPU clusters running at different speeds, there are necessarily asynchronous interfaces between those clusters. That allows breaking up the clock tree, and it permits the faster cores to run at full speed, but it also requires time-consuming resynchronization any time data travels between clusters. You don’t get something for nothing.

The A75 has acquired some high-end features that its predecessor didn’t have; ones likely borrowed from the still-in-design Ares project. It now supports ECC for its caches, more hypervisor hooks, finer-grained performance monitoring, and an interesting feature known as data poisoning. Normally, when a CPU fetches bad data (i.e., with a parity or ECC error), it throws a hardware fault and everything grinds to a stop while the system figures out what to do with the bad data. But relatively high-performance processors like the A75 frequently fetch data they don’t actually use. They might fetch instructions on the far side of a branch that won’t be executed, or they’ll fetch a long cache line but use only one byte of it. Why pull the fire alarm when the bad data isn’t causing a problem?

With data poisoning, the CPU marks the newly fetched data as bad (“poisoned”), but takes no further action until or unless that data is about to be used. Only then does it throw a fault, at which point the system can go through its usual panic phase. When implemented correctly, data poisoning can avoid unnecessary alarms.

For chip designers on the ARM upgrade treadmill, it’s hard not to like the Cortex-A75. All of the same, but more of it. More, better, faster. For those not using ARM’s processors, it’s getting harder to avoid them. And if you’re planning to buy a new phone in 2018, it’ll be pretty much impossible.

Leave a Reply

featured blogs
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 23, 2024
We explore Aerospace and Government (A&G) chip design and explain how Silicon Lifecycle Management (SLM) ensures semiconductor reliability for A&G applications.The post SLM Solutions for Mission-Critical Aerospace and Government Chip Designs appeared first on Chip ...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Portenta C33
Sponsored by Mouser Electronics and Arduino and Renesas
In this episode of Chalk Talk, Marta Barbero from Arduino, Robert Nolf from Renesas, and Amelia Dalton explore how the Portenta C33 module can help you develop cost-effective, real-time applications. They also examine how the Arduino ecosystem supports innovation throughout the development lifecycle and the benefits that the RA6M5 microcontroller from Renesas brings to this solution.  
Nov 8, 2023
22,278 views