The Custom CPU Paradox

“I think there’s something great and generic about goldfish. They’re everybody’s first pet.” – Paul Rudd

RISC-V, like Finn, is kind of a big deal. It’s free, it’s flexible, and it’s fast. It’s not particularly remarkable as microprocessors go, but neither is it offensive or saddled with terrible flaws. It’s the beige Toyota Camry of processors.

That makes RISC-V the safe and easy choice for a lot of developers. But would you consider RISC-V a generic RISC processor or a custom anything-you-want processor? Is it plain vanilla or application-specific? A floor wax or a dessert topping?

It’s both! It’s a paradox. It starts out as a mild-mannered RISC implementation, but when it jumps into the phone booth it becomes a super-processor! It can be as bland or as exciting as you want it to be, a trick that other CPU vendors have also tried to pull off.

Customizable processors can seem like either the dumbest idea in the world or the best thing since canned beer. On one hand, processor ecosystems thrive on compatibility. You can’t develop compilers, debuggers, operating systems, applications, middleware, or even productive programmers if every CPU is different. Without compatibility, we might as well create programs by soldering wires and swapping out hardware. A CPU’s instruction set – a fixed instruction set – is what makes software, software. That compatibility feeds the virtuous cycle of more software, more tools, and more CPU sales.

On the other hand, generic CPUs are boring. Workloads change over time, and not every programmer needs the same set of features. Lots of us don’t need floating-point arithmetic. Some need bit-manipulation instructions. Others need Linux support, or vector operations, or that weird table lookup and interpolate thing.

Oddball instructions can make a big difference – a really big difference – to how your processor performs. Gamers argue over the relative merits of one Intel CPU versus another AMD processor, but those are nearly identical chips with differences of only a few percent. Compare that to how a low-end DSP handily outperforms a high-end ARM, PowerPC, or x86 design and the differences are striking. Architecture and instruction set really do matter.

Trouble is, when you veer from the generic path you lose compatibility and you lose software support. Instructions that aren’t part of the RISC canon get ignored by the compiler, just wasted and superfluous hardware. Only assembly-language programmers or those willing to write compiler intrinsics get to leverage that extra muscle. Unless you’re tweaking benchmarks for a living, what’s the point?

RISC-V takes the middle ground and defines a base instruction set that all processors share, plus a set of optional add-on modules. You want floating-point? We’ve got that all designed, defined, created, and available. Everyone doing FP on RISC-V is doing it the same way, so the compiler writers are happy and your code is more or less portable.

In addition to those semi-standard options, you can branch out and create your own radically custom instructions. This is, after all, an open-source CPU specification, so nobody can stop you. You can even try to sell your creations back to the community, if you want.

Customizable instruction sets have been done before, and they mostly work. ARC and Tensilica, from Synopsys and Cadence, respectively, allow user-level customization. They work in the sense that users really do see large improvements in performance, or reductions in power, when they craft useful new instructions for their particular application. It’s not always a straightforward process, but it’s effective. The downside is that customizations have limited software support, and they’re not compatible with anyone else’s chips.

Of course, that latter characteristic might actually be an advantage. Want to obfuscate your code and prevent reverse engineering? Simply add a few oddball instructions to your processor and use them liberally throughout your software. It doesn’t even matter much what the custom instructions do. The point is that nobody else knows, either.

The real point, of course, is to find hotspots in your code and create custom instructions to accelerate them. Maybe you do a lot of memory walks with a particular stride; a custom load/store pair might help. Repetitive bit twiddling might be condensed into a single special-purpose operation. And so on.

There’s little point in removing instructions, even though it’s possible. Every CPU needs a baseline of instructions to operate. Okay, sure, you can get by with an extremely minimal set of instructions, but that’s mostly of academic interest. The core set used by RISC-V and other designs over the past 20 years includes basic addition, subtraction (sometimes just negative addition), logical operations, and conditional flow control. All of these are useful and none of them is complex to implement in hardware. Meaning, they’re never a performance bottleneck, so there’s no upside to removing them.

At the other extreme, you’ve got processors like Intel’s insanely complicated Ice Lake microarchitecture (officially Core 10th generation), with hundreds of instructions, many of which are little used. That baroque instruction set weighs down the chip, not only in terms of silicon real estate but in all the complex logic required to make it work. Complex CPUs have complex interconnects, large buses, and multiple clock domains – a far cry from RISC-V, even with its optional enhancements installed.

The MIPS architecture has allowed user customization for years, but ARM steadfastly resisted the trend. Others fall somewhere in the middle, with most permitting only very limited tweaking. In that sense, RISC-V is among the more open-minded processors.

As RISC-V proliferates, it’s going to get harder to pin down what the CPU is and isn’t doing. Everyone is free to customize, and many RISC-V designers already have. Apart from the core instruction set and one or two popular extensions, we’re going to see a big gap open up between the stock CPU and the custom hot rods. They’ll all be RISC-V at some level under the hood, but very different in outward appearance. Start your engines!

The Custom CPU Paradox

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk