“We hope that, when the insects take over the world, they will remember with gratitude how we took them along on all our picnics.” – Bill Vaughan
All life on Earth is insects. Statistically speaking, there are so many different species of insects on this planet that all other forms of life – mammals, birds, viruses, plants, algae, you name it – are collectively all just a rounding error.
Similarly, all microprocessors are embedded. So few CPUs go into “computers” like PCs and servers that they might as well not exist (says the guy typing on a laptop PC, whose files are stored on a server somewhere. But stick with me here.)
On the same week that processor behemoth Intel quietly pulled the sheet over Itanium, the little ARC processor from Synopsys graduated to middle school. ARC is almost all growed up, and it’s taking on big-boy responsibilities. Most significantly, it’s now superscalar. Once the province of high-end server processors only, superscalar-ness is now pretty common among embedded CPUs, but this is the first time a member of the ARC family has gotten handed the gold-embossed certificate.
Not just superscalar, but multicore, too! This really is a red-letter day for ARC. Synopsys is sending the new HS4x family out into the world in five different configurations, with single-, dual-, and quad-core implementations and either with or without DSP features packed into its lunchbox. As usual with ARC designs, there are lots of configurable options above and beyond those big ones. Want instruction and data caches? They can do that. Need an MMU? That’s your choice. Feel like floating-point? Check the box on your order form and pull forward to the first window.
And, if you’re looking for something that isn’t on the menu, there’s always the roll-your-own approach. The new HS4x designs retain ARC’s traditional user-configuration ability that lets you add in your own instructions, registers, accelerators, or just about anything else you want, assuming that you’re reasonably handy with a hardware compiler. In the past, ARC customers have built themselves custom crypto engines, unique compression accelerators, or weird “obfuscation units” that just confused onlookers and thwarted reverse-engineering by competitors. (Full disclosure: I used to be employed by ARC before it was acquired by Synopsys.)
Synopsys says its new HS4x cores are 25% faster on integer code than their HS3x predecessors, but as much as 200% faster on DSP code (assuming the DSP option is enabled, of course). Why the dichotomy? Did the DSP design improve that much, or was it just lousy before? And why no 2x improvement in integer code?
The HS4’s integer pipeline is essentially just a longer version of the HS3’s, stretched to 10 stages. That allows the newer cores to hit 1.6–2.2 GHz in 28nm silicon, or 1.9–2.5 GHz in a 16nm FinFET process, according to the company. The longer pipeline enables the higher clock speeds but doesn’t really add any new capabilities to the instruction set over the HS3 generation.
The exception is the dual-issue (superscalar) ability, which bumps up integer performance a little bit, but makes a huge difference to DSP algorithms. With the old single-issue pipeline, the DSP had to rely on the integer pipeline to load and store coefficients, fetch instructions, and execute its control code – an ideal recipe for a bottleneck. The newer core can dedicate half of its pipeline to feeding the DSP unit, loading and storing coefficients all day long, while simultaneously executing integer code on the other half. You’ve now got a DSP unit that’s far more functional and usable, even though the DSP engine itself hasn’t changed much.
Because the HS4x is instruction-set compatible with the HS3x family, you could just move your binaries over to the new CPU, and they’d run. But you probably don’t want to. Unlike, say, an Intel Core i7, ARC processors don’t have hardware assists for aggressively cracking open the execution stream looking for parallelism. That’s the job of the compiler, so, to get the best out of your older ARC code, you’ll want to recompile for the newer pipeline.
Shoppers browsing the aisles looking for a CPU core to license will generally stop at the big ARM display first, before moving on to more budget-friendly options like ARC. Performance, power, and price comparisons are inevitable. Synopsys says its new HS4x delivers twice the performance of ARM’s Cortex-A7, 45% better performance than Cortex-A9, and “higher” performance than Cortex-A17, all while delivering lower consumption than the brand-name British alternatives. It’s not at all clear what configuration options the ARC processors had enabled (nor the ARM processors, for that matter), or what benchmarks the company was using, but at least the slideware numbers give some indication of where the new HS4x cores fit in the overall scheme of things.
In the processor world, there are embedded processors and there are deeply embedded processors – the kind you never see. ARC falls squarely into the latter category. Whereas ARM and MIPS power highly visible consumer items, run “real” operating systems, and have a healthy library of third-party software, ARC (and about a hundred other CPU architectures) toil away in obscurity. That’s not to say they aren’t popular – they’re just concealed. One of ARC’s biggest design wins is in solid-state disks (SSDs), especially the high-end SSDs used in servers and the like. It’s a high-volume design win that’s been good for business but that doesn’t generate many sexy headlines. Synopsys expects the new HS4x family will extend that success, while also gaining sockets in wireless interfaces, automotive systems, and speech-activated interfaces.
Newcomers to the electronics business sometimes comment on how the little black chips look like bugs, with their rows of shiny metal legs. They may be more accurate than they realize.