feature article
Subscribe Now

Do-It-Yourself Linux Machine

Synopsys ARC HS38 Processor Has An Embarrassment of Options

It’s a good month for microprocessor aficionados, what with the new Cortus twins, the MIPS I6400, AMD’s Hierofalcon, and now Synopsys’s ARC HS38. There’s still some differentiation to be had in this market.

Followers of Synopsys know that the EDA company acquired ARC, the CPU-design firm, several years ago and folded the CPU IP into its DesignWare library system. Indeed, the processor cores are branded as DesignWare, reflecting the reality that ARC processors are more like a design tool than a traditional CPU core. That’s because ARC processors are user-defined. You can add and subtract registers, create your own instructions, invent new condition codes, bolt on in-house coprocessors, and more. Every ARC processor has the capability to be unique and oh-so-finely tuned to its intended application, a feature that many developers really like. It must be working: ARC cores have appeared in 1.5 billion chips just in this year alone.

What ARC-based chips typically don’t have is a big backlog of third-party software, a necessary side effect of their configurability (and the minor detail that they’re not ARM, MIPS, or x86). Like most second-tier CPU architectures, ARC processors are used in deeply embedded applications where suitability for purpose, small size, and low cost are more important than a thriving app store.

What ARC does have is Linux support. In fact, Synopsys’s brand new ARC HS38 processor supports both “standard” single-core and SMP multicore implementations of Linux, something a bit new and unusual in the DIY processor arena. So just because you’ve rolled your own processor hardware doesn’t mean you have to give up on familiar operating systems.

The new HS38 represents the new high end of the ARC processor lineup, essentially replacing the previous flagship ARC 770. Where the 770 had (actually, still has) a limited MMU, smaller micro-TLBs, and a restricted physical address range, the HS38 blows out all of those limitations, giving designers control over their MMU page sizes and a 40-bit address space. The HS38 also gains L1 cache coherence and the option for L2 and/or L3 caches, if you’re so inclined.

Ten years of progress has also benefitted the HS38’s instruction set. The default ISA is now ARC v2, a modern compressed instruction set that’s an average of 18% more thrifty with memory compared to the older ARCcompact ISA, according to the company. And, at up to 2.2 GHz clock speed, the HS38 is way faster.

ARC_HS38_Graphic_FINAL.jpg

The HS38 has a ten-stage pipeline, which is longish by embedded-CPU standards. Long pipelines are mandatory for fast clock speeds and high performance, but they exact a penalty every time program flow changes, branches are mis-predicted, or data is loaded from memory and immediately used in an operation. The longer the pipeline, the longer the freight train you have to back up and reroute down the new track.

Branches are somewhat mitigated by a new branch-prediction hardware in the first pipeline stage. The HS38 implements dynamic branch prediction, meaning it guesses on the fly based on recent activity, as opposed to static branch prediction, which relies entirely on hard-coded guidance from the programmer.

There isn’t much Synopsys can do about changes in program flow – that’s up to the programmer – but the HS38 does handle load/use penalties cleverly. Arithmetic and logic operations are typically committed to the register file in stage 6, but they can be pushed back to stage 9 when operands are loaded just before they’re used. The late-commit stage can completely mask the load/use penalty typical of longer pipelines.

Finally, the HS38 is a bit more tolerant of slower memories, something that’s necessary when clock frequencies reach the UHF range. The CPU really has one and a half pipelines, with the second half (stages six through ten) split in two. One half handles arithmetic and logic operations, while the other is dedicated to memory accesses. The Y-shaped pipeline allows the HS38 to take its time (relatively speaking) dealing with operand routing.

New to the HS38 is the option for multiple register files, up to a maximum of eight. This is a bit like what ARM processors (and many microcontrollers) allow, and it enables fast context switching among register sets. Your operating system or scheduler will need to understand how that works, but for fast real-time response, it’s a lot quicker and cleaner than the usual push/pop, call/return stack.

Since it’s technically a configuration tool and not a canned CPU core, the HS38 naturally comes with a lot of design-time options. Don’t want caches? No problem. Don’t need an MMU? That’s doable. In fact, Synopsys is offering three different versions of the new CPU, called HS34, HS36, and HS38. They’re technically the same CPU, just with different options turned on or off. You can save money by licensing the lightweight HS34 version, but you won’t be able to enable the caches, MMU, or SMP Linux support. On the other hand, you can opt for the deluxe HS38 version and later decide to downgrade it to an HS34 or ’36. It’s the same CPU either way; only the financial terms change.

In its ’38 configuration, the CPU supports single, dual, and quad-core configurations. (You can do three cores, too, if you really want.) And, of course, since it’s from Synopsys, there is a wealth of peripheral I/O you can add on. Yes, it’s a good time for processor designers.  

11 thoughts on “Do-It-Yourself Linux Machine”

  1. Pingback: tes cpns 2017
  2. Pingback: Petplay
  3. Pingback: car crash Germany
  4. Pingback: Judi Bola Menarik
  5. Pingback: DMPK Studies
  6. Pingback: coehuman.uodiyala

Leave a Reply

featured blogs
Jan 19, 2022
Explore the importance of system interoperability in hyperscale data centers and why it matters for AI and high-performance computing (HPC) applications. The post Why Interoperability Matters for High-Performance Computing and AI Chip Designs appeared first on From Silicon T...
Jan 19, 2022
2001 was famous for some of the worst security issues (accompanied by obligatory picture of bad guy in a black hoodie): The very first blog post of the year covered SolarWinds. See my post The... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Jan 18, 2022
This column should more properly be titled 'Danny MacAskill Meets Elvis Presley Meets Bollywood Meets Cultural Appropriation,' but I can't spell '˜appropriation.'...

featured video

AI SoC Chats: Understanding Compute Needs for AI SoCs

Sponsored by Synopsys

Will your next system require high performance AI? Learn what the latest systems are using for computation, including AI math, floating point and dot product hardware, and processor IP.

Click here for more information about DesignWare IP for Amazing AI

featured paper

nanoPower Module Extends Battery Life in Space-Constrained Applications

Sponsored by Analog Devices

Designers can now increase battery life and reduce size in space-constrained IoT devices with a power module that features the lowest quiescent current compared to competitive solutions and uSLIC built-in inductor technology that reduces solution size by up to 37%.

Read Now

featured chalk talk

IEC 62368-1 Overvoltage Requirements

Sponsored by Mouser Electronics and Littelfuse

Over-voltage protection is an often neglected and misunderstood part of system design. But often, otherwise well-engineered devices are brought down by over-voltage events. In this episode of Chalk Talk, Amelia Dalton chats with Todd Phillips of Littelfuse about the new IEC 623689-1 standard, what tests are included in the standard, and how the standard allows for greater safety and design flexibility.

Click here for more information about Littelfuse IEC 62368-1 Products