feature article
Subscribe Now

Toward a Hardware-Agnostic World

HSA Foundation Releases Specification v1.1

I think there’s something great and generic about goldfish. They’re everybody’s first pet. – Paul Rudd

It’s finally happened: processors are now completely generic and interchangeable.

Might as well go home, CPU designers. There is no differentiation left to exploit. All of your processor architectures, instruction sets, pipelines, code profiling, register files, clever ALUs, bus interfaces – all of it is now as generic and substitutable as 80’s hair band drummers. Your entire branch of technology has been supplanted by some programmers.

Okay, so maybe it’s not quite that dire. But we’re getting there.

You have the HSA Foundation to thank for that. Their job is to make CPUs, DSPs, GPUs, VLIW machines (and pretty much anything else that can execute code) totally interchangeable. In the big SWOT analysis of hardware resources, the CPU becomes a “don’t care.” That is, HSA (which stands for Heterogeneous Systems Architecture) tries to make any code execute on any processor, regardless of its architecture, instruction set, or number of cores. They’ll let you run your operating system on a DSP, your graphics code on an integer CPU, and your signal-processing algorithms on a GPU. Hardware is hardware; just write your code and let HSA sort it out.

At least, that’s the promise the group has been making for the past few years. It’s what Steve Jobs would’ve called, “a big hairy audacious goal.” Hey, let’s treat all programming languages the same and all hardware engines the same. Programmers can write their source code in whatever language(s) they prefer, and let it run on whatever hardware they have lying around. Most of all, HSA allows you to mix different processor architectures together (that’s the “heterogeneous” part) so that you can, for example, run a multicore x86 processor alongside a cluster of ARM cores, next to a gaggle of nVidia GPUs. Pay no attention to how those processors are interconnected, or how many there are, or even what type of chip you’ve got. It’s all good! Throw ’em all together and let the software sort ’em out!

Sound like magic? Kind of. Sound like a bad idea that’s already been done to death by a thousand different university students who think they’ve stumbled on a fantastic (and original) idea? You’d be correct there, too. The idea of a universal hardware platform is hardly new, and the road to hardware independence is paved with other people’s venture capital. Java is about the only example of hardware-independent software that made any kind of a dent in the industry – but dents can be good or bad. 

But wait a sec – isn’t Java already hardware agnostic (as in, “we don’t believe hardware exists”), and if so, why do we need another one? And for that matter, isn’t all code written in C++ or Python or BASIC or any decent language also platform-independent? Wasn’t that the whole idea of high-level languages? What problem are we actually solving here, and hasn’t it already been solved anyhow?

Well, yes and no. Java bytecode is (ahem) more or less transportable across different CPU architectures… assuming the architecture in question has its own bytecode interpreter or JIT or equivalent translator. And C code is certainly transportable… right up until it’s compiled. At that point, it’s very hardware-specific. But neither of these examples really ignores the underlying architecture of the chip you’re programming. Nobody writes C code without knowing if it’s intended for a conventional CPU, or a DSP, or a graphics processor. Same goes for any other programming language. You always want to know something about the processor it’s going to run on, even if you’re not bit-twiddling individual configuration registers.

So HSA wants to abstract-away that last vestige of processor prejudice. This is particularly important and useful in today’s systems that mix and match so many different kinds of processors. How cool would it be to write your C or Python and truly not care how many processors, or of what type, were ultimately going to host it?

The core of HSA’s technology, as with so many other “universal hardware platforms,” is an intermediate virtual machine. In other words, you’re writing code for an imaginary CPU, and HSA-compliant tools then convert that to actual machine code for the hardware you really have. It’s not too different in concept from any other compiler, and pretty similar to the way Java is compiled.

This intermediate layer is called HSAIL (HSA Intermediate Language), and it’s specified just like a real CPU with a real instruction set and everything. In fact, you can download the HSAIL specification for free and build your own HSA-compliant toolchain if you like. The HSA Foundation would probably be happy to encourage you.

The only hardware requirement to using HSA is that all the processors in your system must share a single, cache-coherent memory space. That’s important, and it’s non-negotiable. It’s the key feature that allows HSA tools to allocate and reallocate code segments among processors. When everyone shares a memory map, a pointer is a pointer, regardless of who created it or who dereferences it. Cache coherence is also mandatory, for much the same reason. The results of one processor’s calculations have to be universally accessible to all the other processors, without careful planning or message-passing.

In fact, that lack of planning and messaging is one of HSA’s strengths, though it’s hardly unique. The group recently ran some benchmarks comparing HSA-compliant code with OpenCL (which also tolerates heterogeneous hardware resources). In HSA’s testing, their code did far better, of course, and often by orders of magnitude.

An FIR filter, for example, ran about 10x to 100x faster than the equivalent OpenCL code. Pretty impressive. But can a toolchain really make that much difference? Depends what you’re comparing it to. Software FIR filters are very memory-intensive, and the OpenCL implementation handles its data structures in a “pass by value” method. In other words, it copies all of the data from one processor’s memory space to another’s. That wastes a huge amount of time (and consumes a lot of memory). HSA, in contrast, does “pass by reference.” Voila – you’ve saved a mountain of time with a different toolchain.

So who’s behind the HSA Foundation? Who stands to gain from this? Like many consortia, HSA draws its members from industry. On the CPU side, they’ve got support from AMD, ARM, and Imagination Technologies. So there’s x86, ARM, and MIPS represented, as well as Radeon, Mali, and PowerVR graphics. Toshiba, Texas instruments, Tensilica, Analog Devices, Ceva, Synopsys (with ARC), and other second-tier CPU vendors also participate. A lot of universities are contributing manpower, and several research laboratories are represented, too. So a good cross-section of interested parties overall.

Does it really work? It seems to, at least in early testing. The group has just released version 1.1 of its specification (also available for free download), and they’re adding support for more compilers and more processors. Compared to v1.0, HSA v1.1 is now more closely compatible with gcc. It’s a long and tricky process, but the HSA Foundation seems to be making real progress toward making CPU designers obsolete. 

Leave a Reply

featured blogs
Sep 23, 2022
When I rejoined Cadence in 2015, we had not yet announced Palladium Z1. But it was basically done, and we announced it a couple of months later. I wrote about the announcement in my post Palladium Z1, an Enterprise Server Farm in a Rack . Next, we created Protium X1 which I c...
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....
Sep 22, 2022
Learn how to design safe and stylish interior and exterior automotive lighting systems with a look at important lighting categories and lighting design tools. The post How to Design Safe, Appealing, Functional Automotive Lighting Systems appeared first on From Silicon To Sof...

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Prognostic Health Management/Predictive Maintenance Solutions for Machines

Sponsored by Mouser Electronics and Advantech

Prognostic health management, also known as predictive maintenance, is a vital component of our industrial ecosystem. In this episode of Chalk Talk, Amelia Dalton chats with Eric Wang from Advantech about the roles that data acquisition, data processing and artificial intelligence play in prognostic health management, the challenges of building these types of systems, and what kind of predictive maintenance solution would be the best fit for your next design.

More information about Advantech WISE-750 Intelligent Vibration Gateway