feature article
Subscribe Now

Do-It-Yourself Linux Machine

Synopsys ARC HS38 Processor Has An Embarrassment of Options

It’s a good month for microprocessor aficionados, what with the new Cortus twins, the MIPS I6400, AMD’s Hierofalcon, and now Synopsys’s ARC HS38. There’s still some differentiation to be had in this market.

Followers of Synopsys know that the EDA company acquired ARC, the CPU-design firm, several years ago and folded the CPU IP into its DesignWare library system. Indeed, the processor cores are branded as DesignWare, reflecting the reality that ARC processors are more like a design tool than a traditional CPU core. That’s because ARC processors are user-defined. You can add and subtract registers, create your own instructions, invent new condition codes, bolt on in-house coprocessors, and more. Every ARC processor has the capability to be unique and oh-so-finely tuned to its intended application, a feature that many developers really like. It must be working: ARC cores have appeared in 1.5 billion chips just in this year alone.

What ARC-based chips typically don’t have is a big backlog of third-party software, a necessary side effect of their configurability (and the minor detail that they’re not ARM, MIPS, or x86). Like most second-tier CPU architectures, ARC processors are used in deeply embedded applications where suitability for purpose, small size, and low cost are more important than a thriving app store.

What ARC does have is Linux support. In fact, Synopsys’s brand new ARC HS38 processor supports both “standard” single-core and SMP multicore implementations of Linux, something a bit new and unusual in the DIY processor arena. So just because you’ve rolled your own processor hardware doesn’t mean you have to give up on familiar operating systems.

The new HS38 represents the new high end of the ARC processor lineup, essentially replacing the previous flagship ARC 770. Where the 770 had (actually, still has) a limited MMU, smaller micro-TLBs, and a restricted physical address range, the HS38 blows out all of those limitations, giving designers control over their MMU page sizes and a 40-bit address space. The HS38 also gains L1 cache coherence and the option for L2 and/or L3 caches, if you’re so inclined.

Ten years of progress has also benefitted the HS38’s instruction set. The default ISA is now ARC v2, a modern compressed instruction set that’s an average of 18% more thrifty with memory compared to the older ARCcompact ISA, according to the company. And, at up to 2.2 GHz clock speed, the HS38 is way faster.

ARC_HS38_Graphic_FINAL.jpg

The HS38 has a ten-stage pipeline, which is longish by embedded-CPU standards. Long pipelines are mandatory for fast clock speeds and high performance, but they exact a penalty every time program flow changes, branches are mis-predicted, or data is loaded from memory and immediately used in an operation. The longer the pipeline, the longer the freight train you have to back up and reroute down the new track.

Branches are somewhat mitigated by a new branch-prediction hardware in the first pipeline stage. The HS38 implements dynamic branch prediction, meaning it guesses on the fly based on recent activity, as opposed to static branch prediction, which relies entirely on hard-coded guidance from the programmer.

There isn’t much Synopsys can do about changes in program flow – that’s up to the programmer – but the HS38 does handle load/use penalties cleverly. Arithmetic and logic operations are typically committed to the register file in stage 6, but they can be pushed back to stage 9 when operands are loaded just before they’re used. The late-commit stage can completely mask the load/use penalty typical of longer pipelines.

Finally, the HS38 is a bit more tolerant of slower memories, something that’s necessary when clock frequencies reach the UHF range. The CPU really has one and a half pipelines, with the second half (stages six through ten) split in two. One half handles arithmetic and logic operations, while the other is dedicated to memory accesses. The Y-shaped pipeline allows the HS38 to take its time (relatively speaking) dealing with operand routing.

New to the HS38 is the option for multiple register files, up to a maximum of eight. This is a bit like what ARM processors (and many microcontrollers) allow, and it enables fast context switching among register sets. Your operating system or scheduler will need to understand how that works, but for fast real-time response, it’s a lot quicker and cleaner than the usual push/pop, call/return stack.

Since it’s technically a configuration tool and not a canned CPU core, the HS38 naturally comes with a lot of design-time options. Don’t want caches? No problem. Don’t need an MMU? That’s doable. In fact, Synopsys is offering three different versions of the new CPU, called HS34, HS36, and HS38. They’re technically the same CPU, just with different options turned on or off. You can save money by licensing the lightweight HS34 version, but you won’t be able to enable the caches, MMU, or SMP Linux support. On the other hand, you can opt for the deluxe HS38 version and later decide to downgrade it to an HS34 or ’36. It’s the same CPU either way; only the financial terms change.

In its ’38 configuration, the CPU supports single, dual, and quad-core configurations. (You can do three cores, too, if you really want.) And, of course, since it’s from Synopsys, there is a wealth of peripheral I/O you can add on. Yes, it’s a good time for processor designers.  

11 thoughts on “Do-It-Yourself Linux Machine”

  1. Pingback: tes cpns 2017
  2. Pingback: Petplay
  3. Pingback: car crash Germany
  4. Pingback: Judi Bola Menarik
  5. Pingback: DMPK Studies
  6. Pingback: coehuman.uodiyala

Leave a Reply

featured blogs
Apr 26, 2024
Biological-inspired developments result in LEDs that are 55% brighter, but 55% brighter than what?...

featured video

Why Wiwynn Energy-Optimized Data Center IT Solutions Use Cadence Optimality Explorer

Sponsored by Cadence Design Systems

In the AI era, as the signal-data rate increases, the signal integrity challenges in server designs also increase. Wiwynn provides hyperscale data centers with innovative cloud IT infrastructure, bringing the best total cost of ownership (TCO), energy, and energy-itemized IT solutions from the cloud to the edge.

Learn more about how Wiwynn is developing a new methodology for PCB designs with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver.

featured paper

Altera® FPGAs and SoCs with FPGA AI Suite and OpenVINO™ Toolkit Drive Embedded/Edge AI/Machine Learning Applications

Sponsored by Intel

Describes the emerging use cases of FPGA-based AI inference in edge and custom AI applications, and software and hardware solutions for edge FPGA AI.

Click here to read more

featured chalk talk

GaN FETs: D-Mode Vs E-mode
Sponsored by Mouser Electronics and Nexperia
The use of gallium nitride can offer higher power efficiency, increased power density and can reduce the overall size and weight of many industrial, automotive, and data center applications. In this episode of Chalk Talk, Amelia Dalton and Giuliano Cassataro from Nexperia investigate the benefits of Gan FETs, the difference between D-Mode and E-mode GaN FET technology and how you can utilize GaN FETs in your next design.
Mar 25, 2024
5,284 views