feature article
Subscribe Now

Toward a Hardware-Agnostic World

HSA Foundation Releases Specification v1.1

I think there’s something great and generic about goldfish. They’re everybody’s first pet. – Paul Rudd

It’s finally happened: processors are now completely generic and interchangeable.

Might as well go home, CPU designers. There is no differentiation left to exploit. All of your processor architectures, instruction sets, pipelines, code profiling, register files, clever ALUs, bus interfaces – all of it is now as generic and substitutable as 80’s hair band drummers. Your entire branch of technology has been supplanted by some programmers.

Okay, so maybe it’s not quite that dire. But we’re getting there.

You have the HSA Foundation to thank for that. Their job is to make CPUs, DSPs, GPUs, VLIW machines (and pretty much anything else that can execute code) totally interchangeable. In the big SWOT analysis of hardware resources, the CPU becomes a “don’t care.” That is, HSA (which stands for Heterogeneous Systems Architecture) tries to make any code execute on any processor, regardless of its architecture, instruction set, or number of cores. They’ll let you run your operating system on a DSP, your graphics code on an integer CPU, and your signal-processing algorithms on a GPU. Hardware is hardware; just write your code and let HSA sort it out.

At least, that’s the promise the group has been making for the past few years. It’s what Steve Jobs would’ve called, “a big hairy audacious goal.” Hey, let’s treat all programming languages the same and all hardware engines the same. Programmers can write their source code in whatever language(s) they prefer, and let it run on whatever hardware they have lying around. Most of all, HSA allows you to mix different processor architectures together (that’s the “heterogeneous” part) so that you can, for example, run a multicore x86 processor alongside a cluster of ARM cores, next to a gaggle of nVidia GPUs. Pay no attention to how those processors are interconnected, or how many there are, or even what type of chip you’ve got. It’s all good! Throw ’em all together and let the software sort ’em out!

Sound like magic? Kind of. Sound like a bad idea that’s already been done to death by a thousand different university students who think they’ve stumbled on a fantastic (and original) idea? You’d be correct there, too. The idea of a universal hardware platform is hardly new, and the road to hardware independence is paved with other people’s venture capital. Java is about the only example of hardware-independent software that made any kind of a dent in the industry – but dents can be good or bad. 

But wait a sec – isn’t Java already hardware agnostic (as in, “we don’t believe hardware exists”), and if so, why do we need another one? And for that matter, isn’t all code written in C++ or Python or BASIC or any decent language also platform-independent? Wasn’t that the whole idea of high-level languages? What problem are we actually solving here, and hasn’t it already been solved anyhow?

Well, yes and no. Java bytecode is (ahem) more or less transportable across different CPU architectures… assuming the architecture in question has its own bytecode interpreter or JIT or equivalent translator. And C code is certainly transportable… right up until it’s compiled. At that point, it’s very hardware-specific. But neither of these examples really ignores the underlying architecture of the chip you’re programming. Nobody writes C code without knowing if it’s intended for a conventional CPU, or a DSP, or a graphics processor. Same goes for any other programming language. You always want to know something about the processor it’s going to run on, even if you’re not bit-twiddling individual configuration registers.

So HSA wants to abstract-away that last vestige of processor prejudice. This is particularly important and useful in today’s systems that mix and match so many different kinds of processors. How cool would it be to write your C or Python and truly not care how many processors, or of what type, were ultimately going to host it?

The core of HSA’s technology, as with so many other “universal hardware platforms,” is an intermediate virtual machine. In other words, you’re writing code for an imaginary CPU, and HSA-compliant tools then convert that to actual machine code for the hardware you really have. It’s not too different in concept from any other compiler, and pretty similar to the way Java is compiled.

This intermediate layer is called HSAIL (HSA Intermediate Language), and it’s specified just like a real CPU with a real instruction set and everything. In fact, you can download the HSAIL specification for free and build your own HSA-compliant toolchain if you like. The HSA Foundation would probably be happy to encourage you.

The only hardware requirement to using HSA is that all the processors in your system must share a single, cache-coherent memory space. That’s important, and it’s non-negotiable. It’s the key feature that allows HSA tools to allocate and reallocate code segments among processors. When everyone shares a memory map, a pointer is a pointer, regardless of who created it or who dereferences it. Cache coherence is also mandatory, for much the same reason. The results of one processor’s calculations have to be universally accessible to all the other processors, without careful planning or message-passing.

In fact, that lack of planning and messaging is one of HSA’s strengths, though it’s hardly unique. The group recently ran some benchmarks comparing HSA-compliant code with OpenCL (which also tolerates heterogeneous hardware resources). In HSA’s testing, their code did far better, of course, and often by orders of magnitude.

An FIR filter, for example, ran about 10x to 100x faster than the equivalent OpenCL code. Pretty impressive. But can a toolchain really make that much difference? Depends what you’re comparing it to. Software FIR filters are very memory-intensive, and the OpenCL implementation handles its data structures in a “pass by value” method. In other words, it copies all of the data from one processor’s memory space to another’s. That wastes a huge amount of time (and consumes a lot of memory). HSA, in contrast, does “pass by reference.” Voila – you’ve saved a mountain of time with a different toolchain.

So who’s behind the HSA Foundation? Who stands to gain from this? Like many consortia, HSA draws its members from industry. On the CPU side, they’ve got support from AMD, ARM, and Imagination Technologies. So there’s x86, ARM, and MIPS represented, as well as Radeon, Mali, and PowerVR graphics. Toshiba, Texas instruments, Tensilica, Analog Devices, Ceva, Synopsys (with ARC), and other second-tier CPU vendors also participate. A lot of universities are contributing manpower, and several research laboratories are represented, too. So a good cross-section of interested parties overall.

Does it really work? It seems to, at least in early testing. The group has just released version 1.1 of its specification (also available for free download), and they’re adding support for more compilers and more processors. Compared to v1.0, HSA v1.1 is now more closely compatible with gcc. It’s a long and tricky process, but the HSA Foundation seems to be making real progress toward making CPU designers obsolete. 

Leave a Reply

featured blogs
Jan 18, 2021
The DIY electronics portion AliExpress website can be a time-sink for the unwary because one tempting project leads to another....
Jan 17, 2021
https://youtu.be/mKoW8ji9_g8 Made in my kitchen (camera Ziyue Zhang) Monday: Young People Program at DATE 2021 Tuesday: IEDM Opening Keynote Wednesday: Cadence/Arm Event on Optimizing High-End Arm... [[ Click on the title to access the full blog on the Cadence Community site...
Jan 14, 2021
Learn how electronic design automation (EDA) tools & silicon-proven IP enable today's most influential smart tech, including ADAS, 5G, IoT, and Cloud services. The post 5 Key Innovations that Are Making Everything Smarter appeared first on From Silicon To Software....
Jan 13, 2021
Testing is the final step of any manufacturing process, and arguably the most important, and yet it can often be overlooked.  Releasing a poorly tested product onto the market has destroyed more than one reputation for quality, and this is even more important in an age when ...

featured paper

Speeding Up Large-Scale EM Simulation of ICs Without Compromising Accuracy

Sponsored by Cadence Design Systems

With growing on-chip RF content, electromagnetic (EM) simulation of passives is critical — from selecting the right RF design candidates to detecting parasitic coupling. Being on-chip, accurate EM analysis requires a tie in to the process technology with process design kits (PDKs) and foundry-certified EM simulation technology. Anything short of that could compromise the RFIC’s functionality. Learn how to get the highest-in-class accuracy and 10X faster analysis.

Click here to download the whitepaper

Featured Chalk Talk

Accelerate the Integration of Power Conversion with microBUCK® and microBRICK™

Sponsored by Mouser Electronics and Vishay

In the world of power conversion, multi-chip packaging, thermal performance, and power density can make all of the difference in the success of your next design. In this episode of Chalk Talk, Amelia Dalton chats with Raymond Jiang about the trends and challenges in power delivery and how you can leverage the unique combination of discrete MOSFET design, IC expertise, and packaging capability of Vishay’s microBRICK™and microBUCK® integrated voltage regulators.

Click here for more information about Vishay microBUCK® and microBRICK™ DC/DC Regulators