feature article
Subscribe Now

How the Mighty Have Grown

Synopsys Goes Superscalar with ARC HS40 Processor Core Designs

“We hope that, when the insects take over the world, they will remember with gratitude how we took them along on all our picnics.” – Bill Vaughan

All life on Earth is insects. Statistically speaking, there are so many different species of insects on this planet that all other forms of life – mammals, birds, viruses, plants, algae, you name it – are collectively all just a rounding error.

Similarly, all microprocessors are embedded. So few CPUs go into “computers” like PCs and servers that they might as well not exist (says the guy typing on a laptop PC, whose files are stored on a server somewhere. But stick with me here.)

On the same week that processor behemoth Intel quietly pulled the sheet over Itanium, the little ARC processor from Synopsys graduated to middle school. ARC is almost all growed up, and it’s taking on big-boy responsibilities. Most significantly, it’s now superscalar. Once the province of high-end server processors only, superscalar-ness is now pretty common among embedded CPUs, but this is the first time a member of the ARC family has gotten handed the gold-embossed certificate.

Not just superscalar, but multicore, too! This really is a red-letter day for ARC. Synopsys is sending the new HS4x family out into the world in five different configurations, with single-, dual-, and quad-core implementations and either with or without DSP features packed into its lunchbox. As usual with ARC designs, there are lots of configurable options above and beyond those big ones. Want instruction and data caches? They can do that. Need an MMU? That’s your choice. Feel like floating-point? Check the box on your order form and pull forward to the first window.

And, if you’re looking for something that isn’t on the menu, there’s always the roll-your-own approach. The new HS4x designs retain ARC’s traditional user-configuration ability that lets you add in your own instructions, registers, accelerators, or just about anything else you want, assuming that you’re reasonably handy with a hardware compiler. In the past, ARC customers have built themselves custom crypto engines, unique compression accelerators, or weird “obfuscation units” that just confused onlookers and thwarted reverse-engineering by competitors. (Full disclosure: I used to be employed by ARC before it was acquired by Synopsys.)

Synopsys says its new HS4x cores are 25% faster on integer code than their HS3x predecessors, but as much as 200% faster on DSP code (assuming the DSP option is enabled, of course). Why the dichotomy? Did the DSP design improve that much, or was it just lousy before? And why no 2x improvement in integer code?

The HS4’s integer pipeline is essentially just a longer version of the HS3’s, stretched to 10 stages. That allows the newer cores to hit 1.6–2.2 GHz in 28nm silicon, or 1.9–2.5 GHz in a 16nm FinFET process, according to the company. The longer pipeline enables the higher clock speeds but doesn’t really add any new capabilities to the instruction set over the HS3 generation.

The exception is the dual-issue (superscalar) ability, which bumps up integer performance a little bit, but makes a huge difference to DSP algorithms. With the old single-issue pipeline, the DSP had to rely on the integer pipeline to load and store coefficients, fetch instructions, and execute its control code – an ideal recipe for a bottleneck. The newer core can dedicate half of its pipeline to feeding the DSP unit, loading and storing coefficients all day long, while simultaneously executing integer code on the other half. You’ve now got a DSP unit that’s far more functional and usable, even though the DSP engine itself hasn’t changed much.

Because the HS4x is instruction-set compatible with the HS3x family, you could just move your binaries over to the new CPU, and they’d run. But you probably don’t want to. Unlike, say, an Intel Core i7, ARC processors don’t have hardware assists for aggressively cracking open the execution stream looking for parallelism. That’s the job of the compiler, so, to get the best out of your older ARC code, you’ll want to recompile for the newer pipeline.

Shoppers browsing the aisles looking for a CPU core to license will generally stop at the big ARM display first, before moving on to more budget-friendly options like ARC. Performance, power, and price comparisons are inevitable. Synopsys says its new HS4x delivers twice the performance of ARM’s Cortex-A7, 45% better performance than Cortex-A9, and “higher” performance than Cortex-A17, all while delivering lower consumption than the brand-name British alternatives. It’s not at all clear what configuration options the ARC processors had enabled (nor the ARM processors, for that matter), or what benchmarks the company was using, but at least the slideware numbers give some indication of where the new HS4x cores fit in the overall scheme of things.

In the processor world, there are embedded processors and there are deeply embedded processors – the kind you never see. ARC falls squarely into the latter category. Whereas ARM and MIPS power highly visible consumer items, run “real” operating systems, and have a healthy library of third-party software, ARC (and about a hundred other CPU architectures) toil away in obscurity. That’s not to say they aren’t popular – they’re just concealed. One of ARC’s biggest design wins is in solid-state disks (SSDs), especially the high-end SSDs used in servers and the like. It’s a high-volume design win that’s been good for business but that doesn’t generate many sexy headlines. Synopsys expects the new HS4x family will extend that success, while also gaining sockets in wireless interfaces, automotive systems, and speech-activated interfaces.

Newcomers to the electronics business sometimes comment on how the little black chips look like bugs, with their rows of shiny metal legs. They may be more accurate than they realize.

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

GaN Solutions Featuring EcoGaN™ and Nano Pulse Control
In this episode of Chalk Talk, Amelia Dalton and Kengo Ohmori from ROHM Semiconductor examine the details and benefits of ROHM Semiconductor’s new lineup of EcoGaN™ Power Stage ICs that can reduce the component count by 99% and the power loss of your next design by 55%. They also investigate ROHM’s Ultra-High-Speed Control IC Technology called Nano Pulse Control that maximizes the performance of GaN devices.
Oct 9, 2023
25,758 views