feature article
Subscribe Now

How the Mighty Have Grown

Synopsys Goes Superscalar with ARC HS40 Processor Core Designs

“We hope that, when the insects take over the world, they will remember with gratitude how we took them along on all our picnics.” – Bill Vaughan

All life on Earth is insects. Statistically speaking, there are so many different species of insects on this planet that all other forms of life – mammals, birds, viruses, plants, algae, you name it – are collectively all just a rounding error.

Similarly, all microprocessors are embedded. So few CPUs go into “computers” like PCs and servers that they might as well not exist (says the guy typing on a laptop PC, whose files are stored on a server somewhere. But stick with me here.)

On the same week that processor behemoth Intel quietly pulled the sheet over Itanium, the little ARC processor from Synopsys graduated to middle school. ARC is almost all growed up, and it’s taking on big-boy responsibilities. Most significantly, it’s now superscalar. Once the province of high-end server processors only, superscalar-ness is now pretty common among embedded CPUs, but this is the first time a member of the ARC family has gotten handed the gold-embossed certificate.

Not just superscalar, but multicore, too! This really is a red-letter day for ARC. Synopsys is sending the new HS4x family out into the world in five different configurations, with single-, dual-, and quad-core implementations and either with or without DSP features packed into its lunchbox. As usual with ARC designs, there are lots of configurable options above and beyond those big ones. Want instruction and data caches? They can do that. Need an MMU? That’s your choice. Feel like floating-point? Check the box on your order form and pull forward to the first window.

And, if you’re looking for something that isn’t on the menu, there’s always the roll-your-own approach. The new HS4x designs retain ARC’s traditional user-configuration ability that lets you add in your own instructions, registers, accelerators, or just about anything else you want, assuming that you’re reasonably handy with a hardware compiler. In the past, ARC customers have built themselves custom crypto engines, unique compression accelerators, or weird “obfuscation units” that just confused onlookers and thwarted reverse-engineering by competitors. (Full disclosure: I used to be employed by ARC before it was acquired by Synopsys.)

Synopsys says its new HS4x cores are 25% faster on integer code than their HS3x predecessors, but as much as 200% faster on DSP code (assuming the DSP option is enabled, of course). Why the dichotomy? Did the DSP design improve that much, or was it just lousy before? And why no 2x improvement in integer code?

The HS4’s integer pipeline is essentially just a longer version of the HS3’s, stretched to 10 stages. That allows the newer cores to hit 1.6–2.2 GHz in 28nm silicon, or 1.9–2.5 GHz in a 16nm FinFET process, according to the company. The longer pipeline enables the higher clock speeds but doesn’t really add any new capabilities to the instruction set over the HS3 generation.

The exception is the dual-issue (superscalar) ability, which bumps up integer performance a little bit, but makes a huge difference to DSP algorithms. With the old single-issue pipeline, the DSP had to rely on the integer pipeline to load and store coefficients, fetch instructions, and execute its control code – an ideal recipe for a bottleneck. The newer core can dedicate half of its pipeline to feeding the DSP unit, loading and storing coefficients all day long, while simultaneously executing integer code on the other half. You’ve now got a DSP unit that’s far more functional and usable, even though the DSP engine itself hasn’t changed much.

Because the HS4x is instruction-set compatible with the HS3x family, you could just move your binaries over to the new CPU, and they’d run. But you probably don’t want to. Unlike, say, an Intel Core i7, ARC processors don’t have hardware assists for aggressively cracking open the execution stream looking for parallelism. That’s the job of the compiler, so, to get the best out of your older ARC code, you’ll want to recompile for the newer pipeline.

Shoppers browsing the aisles looking for a CPU core to license will generally stop at the big ARM display first, before moving on to more budget-friendly options like ARC. Performance, power, and price comparisons are inevitable. Synopsys says its new HS4x delivers twice the performance of ARM’s Cortex-A7, 45% better performance than Cortex-A9, and “higher” performance than Cortex-A17, all while delivering lower consumption than the brand-name British alternatives. It’s not at all clear what configuration options the ARC processors had enabled (nor the ARM processors, for that matter), or what benchmarks the company was using, but at least the slideware numbers give some indication of where the new HS4x cores fit in the overall scheme of things.

In the processor world, there are embedded processors and there are deeply embedded processors – the kind you never see. ARC falls squarely into the latter category. Whereas ARM and MIPS power highly visible consumer items, run “real” operating systems, and have a healthy library of third-party software, ARC (and about a hundred other CPU architectures) toil away in obscurity. That’s not to say they aren’t popular – they’re just concealed. One of ARC’s biggest design wins is in solid-state disks (SSDs), especially the high-end SSDs used in servers and the like. It’s a high-volume design win that’s been good for business but that doesn’t generate many sexy headlines. Synopsys expects the new HS4x family will extend that success, while also gaining sockets in wireless interfaces, automotive systems, and speech-activated interfaces.

Newcomers to the electronics business sometimes comment on how the little black chips look like bugs, with their rows of shiny metal legs. They may be more accurate than they realize.

Leave a Reply

featured blogs
Sep 30, 2022
When I wrote my book 'Bebop to the Boolean Boogie,' it was certainly not my intention to lead 6-year-old boys astray....
Sep 30, 2022
Wow, September has flown by. It's already the last Friday of the month, the last day of the month in fact, and so time for a monthly update. Kaufman Award The 2022 Kaufman Award honors Giovanni (Nanni) De Micheli of École Polytechnique Fédérale de Lausanne...
Sep 29, 2022
We explain how silicon photonics uses CMOS manufacturing to create photonic integrated circuits (PICs), solid state LiDAR sensors, integrated lasers, and more. The post What You Need to Know About Silicon Photonics appeared first on From Silicon To Software....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

"Scalable Power Delivery" for High-Performance ASICs, SoCs, and xPUs

Sponsored by Infineon

Today’s AI and Networking applications are driving an exponential increase in compute power. When it comes to scaling power for these kinds of applications with next generation chipsets, we need to keep in mind package size constraints, dynamic current balancing, and output capacitance. In this episode of Chalk Talk, Mark Rodrigues from Infineon joins Amelia Dalton to discuss the system design challenges with increasing power density for next generation chipsets, the benefits that phase paralleling brings to the table, and why Infineon’s best in class transient performance with XDP architecture and Trans Inductor Voltage Regulator can help power  your next high performance ASIC, SoC or xPU design.

Click here for more information about computing and data storage from Infineon