feature article
Subscribe Now

Do-It-Yourself Linux Machine

Synopsys ARC HS38 Processor Has An Embarrassment of Options

It’s a good month for microprocessor aficionados, what with the new Cortus twins, the MIPS I6400, AMD’s Hierofalcon, and now Synopsys’s ARC HS38. There’s still some differentiation to be had in this market.

Followers of Synopsys know that the EDA company acquired ARC, the CPU-design firm, several years ago and folded the CPU IP into its DesignWare library system. Indeed, the processor cores are branded as DesignWare, reflecting the reality that ARC processors are more like a design tool than a traditional CPU core. That’s because ARC processors are user-defined. You can add and subtract registers, create your own instructions, invent new condition codes, bolt on in-house coprocessors, and more. Every ARC processor has the capability to be unique and oh-so-finely tuned to its intended application, a feature that many developers really like. It must be working: ARC cores have appeared in 1.5 billion chips just in this year alone.

What ARC-based chips typically don’t have is a big backlog of third-party software, a necessary side effect of their configurability (and the minor detail that they’re not ARM, MIPS, or x86). Like most second-tier CPU architectures, ARC processors are used in deeply embedded applications where suitability for purpose, small size, and low cost are more important than a thriving app store.

What ARC does have is Linux support. In fact, Synopsys’s brand new ARC HS38 processor supports both “standard” single-core and SMP multicore implementations of Linux, something a bit new and unusual in the DIY processor arena. So just because you’ve rolled your own processor hardware doesn’t mean you have to give up on familiar operating systems.

The new HS38 represents the new high end of the ARC processor lineup, essentially replacing the previous flagship ARC 770. Where the 770 had (actually, still has) a limited MMU, smaller micro-TLBs, and a restricted physical address range, the HS38 blows out all of those limitations, giving designers control over their MMU page sizes and a 40-bit address space. The HS38 also gains L1 cache coherence and the option for L2 and/or L3 caches, if you’re so inclined.

Ten years of progress has also benefitted the HS38’s instruction set. The default ISA is now ARC v2, a modern compressed instruction set that’s an average of 18% more thrifty with memory compared to the older ARCcompact ISA, according to the company. And, at up to 2.2 GHz clock speed, the HS38 is way faster.

ARC_HS38_Graphic_FINAL.jpg

The HS38 has a ten-stage pipeline, which is longish by embedded-CPU standards. Long pipelines are mandatory for fast clock speeds and high performance, but they exact a penalty every time program flow changes, branches are mis-predicted, or data is loaded from memory and immediately used in an operation. The longer the pipeline, the longer the freight train you have to back up and reroute down the new track.

Branches are somewhat mitigated by a new branch-prediction hardware in the first pipeline stage. The HS38 implements dynamic branch prediction, meaning it guesses on the fly based on recent activity, as opposed to static branch prediction, which relies entirely on hard-coded guidance from the programmer.

There isn’t much Synopsys can do about changes in program flow – that’s up to the programmer – but the HS38 does handle load/use penalties cleverly. Arithmetic and logic operations are typically committed to the register file in stage 6, but they can be pushed back to stage 9 when operands are loaded just before they’re used. The late-commit stage can completely mask the load/use penalty typical of longer pipelines.

Finally, the HS38 is a bit more tolerant of slower memories, something that’s necessary when clock frequencies reach the UHF range. The CPU really has one and a half pipelines, with the second half (stages six through ten) split in two. One half handles arithmetic and logic operations, while the other is dedicated to memory accesses. The Y-shaped pipeline allows the HS38 to take its time (relatively speaking) dealing with operand routing.

New to the HS38 is the option for multiple register files, up to a maximum of eight. This is a bit like what ARM processors (and many microcontrollers) allow, and it enables fast context switching among register sets. Your operating system or scheduler will need to understand how that works, but for fast real-time response, it’s a lot quicker and cleaner than the usual push/pop, call/return stack.

Since it’s technically a configuration tool and not a canned CPU core, the HS38 naturally comes with a lot of design-time options. Don’t want caches? No problem. Don’t need an MMU? That’s doable. In fact, Synopsys is offering three different versions of the new CPU, called HS34, HS36, and HS38. They’re technically the same CPU, just with different options turned on or off. You can save money by licensing the lightweight HS34 version, but you won’t be able to enable the caches, MMU, or SMP Linux support. On the other hand, you can opt for the deluxe HS38 version and later decide to downgrade it to an HS34 or ’36. It’s the same CPU either way; only the financial terms change.

In its ’38 configuration, the CPU supports single, dual, and quad-core configurations. (You can do three cores, too, if you really want.) And, of course, since it’s from Synopsys, there is a wealth of peripheral I/O you can add on. Yes, it’s a good time for processor designers.  

11 thoughts on “Do-It-Yourself Linux Machine”

  1. Pingback: tes cpns 2017
  2. Pingback: Petplay
  3. Pingback: car crash Germany
  4. Pingback: Judi Bola Menarik
  5. Pingback: DMPK Studies
  6. Pingback: coehuman.uodiyala

Leave a Reply

featured blogs
Apr 19, 2024
In today's rapidly evolving digital landscape, staying at the cutting edge is crucial to success. For MaxLinear, bridging the gap between firmware and hardware development has been pivotal. All of the company's products solve critical communication and high-frequency analysis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
Sponsored by Mouser Electronics and Amphenol
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023
37,157 views