feature article
Subscribe Now

The Steady March of Progress

ARM’s New Cortex-A75 is More of the Same, and That’s a Good Thing

“Those who cannot change their minds cannot change anything.” – George Bernard Shaw

To no one’s great surprise, ARM has released a new set of microprocessor cores.

You could almost set your watch by ARM’s upgrade announcements, so regular and predictable have they become. What’s this – about the umpty-fifth new processor to come out of the British-based, Japanese-owned company in about the last ten years? Do these guys ever take a day off?

ARM has more flavors of CPU than Crest has of toothpaste. Between the Cortex A-series, the low-end M-series, and the little-known R-series, ARM has, like General Motors, a processor “for every purse and purpose.” Alfred P. Sloan would be proud.

New up this month are the Cortex-A75 and Cortex-A55. The A75 is the more interesting of the two because it’s the bigger, faster, better-looking sibling. The A75 more or less replaces the Cortex-A72 and/or -A73 as ARM’s high-end mobile processor. It is not, however, the much-rumored server processor that’s expected later this year. That CPU, code-named Ares (Wonder Woman’s nemesis) will be even faster but won’t be very mobile-friendly.

The new A55 and A75 are the first two new cores in the DynamIQ generation (see March 29, 2017), and the first to implement the ARM v8.2a architecture specification. Well, most of it, anyway. Even these cores don’t execute quite all the new instructions that appear in the spec, although they are an upgrade from current ARM ISAs.

The A75 looks a whole lot like its predecessor, the A73. They have the same 11-stage pipeline, the same seven execution units, and the same three levels of caching. There’s only so much you can change in one year. But it’s the little tweaks that count.

Although the A73 and A75 share the same mix of execution resources, the A75 now has seven instruction queues, one for each unit, up from four on the A73. That should result in less stalling. The A75 also has an additional instruction decoder – three, instead of two – some tweaked branch-prediction logic, and it fetches four instructions per cycle (up from three). Overall, the A75 is less congested than its predecessor, even though they both run similar instructions on similar hardware. It’s not so much that the A75 is faster than the A73. It just slows down less often.

ARM says the A75 is about 20% faster on integer code, and 30% faster on FP, compared to the A73, all things being equal. That’s a nice speed bump for what is essentially a refreshed, rather than a wholly redesigned, CPU core. The A75’s clock rates should be the same as the A73’s, since the pipeline didn’t get any longer or appreciably more (or less) complex. The A75 obviously contains more logic than the A73, yet ARM says the power consumption is the same between the two. Credit more tweaking. On the other hand, if the A75 delivers better performance at the same clock speed and power consumption, it should be able to finish a given task 20% to 30% quicker, permitting an earlier shutdown. Thus, battery-powered applications may actually see a decrease in power. Or better performance for the same power – your choice.

Because the A75 and A55 are compatible with DynamIQ, instead of (or in addition to) big.little, they can theoretically be clustered with any other DynamIQ-compatible ARM processors, not just themselves. Right now, however, that set includes exactly no other processors except the A75 and A55. All new ARM cores from here onwards will presumably be DynamIQ-aware, but in the meantime, these two are it.

DynamIQ’s flexibility comes with a cost. On the plus side, it enables heterogenous mixes of processors – up to 256 of them, in fact. Those CPUs can run at different clock speeds and have very different processing capabilities. Once the selection of DynamIQ-aware CPUs expands beyond just these two, it should be possible to mix and match ARM cores in almost infinite varieties. Furthermore, DynamIQ-compatible CPUs like the A75 and A55 have private, rather than shared, L2 caches, which improve on-core performance a bit.

The downside is that performance across clusters may suffer by a small amount, as the L2 caches are now private. And, since DynamIQ permits mixing CPU clusters running at different speeds, there are necessarily asynchronous interfaces between those clusters. That allows breaking up the clock tree, and it permits the faster cores to run at full speed, but it also requires time-consuming resynchronization any time data travels between clusters. You don’t get something for nothing.

The A75 has acquired some high-end features that its predecessor didn’t have; ones likely borrowed from the still-in-design Ares project. It now supports ECC for its caches, more hypervisor hooks, finer-grained performance monitoring, and an interesting feature known as data poisoning. Normally, when a CPU fetches bad data (i.e., with a parity or ECC error), it throws a hardware fault and everything grinds to a stop while the system figures out what to do with the bad data. But relatively high-performance processors like the A75 frequently fetch data they don’t actually use. They might fetch instructions on the far side of a branch that won’t be executed, or they’ll fetch a long cache line but use only one byte of it. Why pull the fire alarm when the bad data isn’t causing a problem?

With data poisoning, the CPU marks the newly fetched data as bad (“poisoned”), but takes no further action until or unless that data is about to be used. Only then does it throw a fault, at which point the system can go through its usual panic phase. When implemented correctly, data poisoning can avoid unnecessary alarms.

For chip designers on the ARM upgrade treadmill, it’s hard not to like the Cortex-A75. All of the same, but more of it. More, better, faster. For those not using ARM’s processors, it’s getting harder to avoid them. And if you’re planning to buy a new phone in 2018, it’ll be pretty much impossible.

Leave a Reply

featured blogs
Sep 28, 2022
Learn how our acquisition of FishTail Design Automation unifies end-to-end timing constraints generation and verification during the chip design process. The post Synopsys Acquires FishTail Design Automation, Unifying Constraints Handling for Enhanced Chip Design Process app...
Sep 28, 2022
You might think that hearing aids are a bit of a sleepy backwater. Indeed, the only time I can remember coming across them in my job at Cadence was at a CadenceLIVE Europe presentation that I never blogged about, or if I did, it was such a passing reference that Google cannot...
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Gate Driving Your Problems Away

Sponsored by Mouser Electronics and Infineon

Isolated gate drivers are a crucial design element that can protect our designs from over-voltage and short circuits. But how can we fine tune these isolated gate drivers to match the design requirements we need? In this episode of Chalk Talk, Amelia Dalton and Perry Rothenbaum from Infineon explore the programmable features included in the EiceDRIVER™ X3 single-channel highly flexible isolated gate drivers from Infineon. They also examine why their reliable and accurate protection, precise and fast on and off switching and DESAT protection can make them a great fit for your next design.

Click here for more information about Infineon Technologies EiceDRIVER™ Isolated & Non-Isolated Gate Drivers