feature article
Subscribe Now

VISC Processor Secrets Revealed

Soft Machines Uses Combination of Tricks to Improve Performance

Still trying to juggle those flaming chainsaws? Splendid, because now we’re going to see how it’s done.

Last week we introduced Soft Machines and its VISC processor, a new CPU design that runs native ARM code even though it’s not an ARM processor. Soft Machines says VISC can also be tailored to run x86 code, Java code, or just about anything else the company decides is worthwhile. It’s a tabula rasa microprocessor: able to run just about anything you throw at it.

Its other major trick is that it can extract more single-thread performance out of a given binary program than any other CPU. And do so without expending a horrendous number of transistors or consuming planetary levels of energy. Let’s start with that part. 

VISC is a multicore processor, which means it’s got two (or more) identical CPU cores running side by side. No surprises there. In “normal” multicore processors, like we see from Intel, AMD, ARM, MIPS, and others, the two cores operate independently, with one core running one instruction stream, or thread, and the other core(s) running a different thread. If the particular program you’re running can’t be split into threads, one core sits momentarily idle. But the one-to-one correlation between threads and cores is hardwired into most multicore processors.

VISC works a bit differently. Although there are two identical cores (in the initial implementation; future versions will have four cores), there’s no one-to-one correlation between threads and cores. One thread might run on both cores, “borrowing” resources such as an adder or a multiplier, from its neighboring core. This allows a complex thread to briefly spread itself across all the resources the chip can offer, in order to execute in the minimum amount of time.

The leftover resources from the second core needn’t go to waste, either. If one thread uses, say, one-and-a-half cores (as in the above example), the remaining “half core” can execute another thread. This permits a much more fine-grained use of resources than other multicore processors offer, which means more of the chip’s energy goes into useful work and less into sitting idle.

Part of the smarts to enable this lies in the CPU core, and part appears earlier in the pipeline, in a feature that Soft Machines simply calls the Global Front End. The GFE fetches code from VISC’s unified instruction cache and starts the process of looking for parallelism and dependencies. It does this largely by a process of elimination, looking for instructions that are interdependent because they use the same registers, depend upon each other’s output, or reference the same pointers, for example. The idea here is to encapsulate data references within a single core. What you don’t want is code in Core A using the same registers as the thread in Core B.

Microprocessor aficionados will be familiar with the concepts of register renaming and out-of-order execution. VISC does both. Although the chip has a set of software-visible registers (namely, the ARM register set in the initial implementation), it really has a completely different internal register set. Each core gets its own registers, and the Global Front End handles the register renaming as each thread makes its way in/out of the core. 

Once a batch of interdependent instructions is dispatched to one core or the other, the core itself works on reordering them. This low-level reordering is unique among current processors. Normally, that work is all done up front, and the cores simply do what they’re told. VISC has distributed that intelligence, allowing the Global Front End to make the first-level decisions about dependencies and threads, while delegating the reordering of operations to the cores themselves.

Because instruction reordering implies speculative execution, VISC must withhold posting the results of any instruction until the results of all the previous instructions are resolved. This is particularly important following a branch, where all of the operations must be abandoned while execution resumes at the branch target. In this case, VISC isn’t much different from other speculative processors (think x86). There’s just no way around branches.

So where does the ARM emulation come in? That also happens in the Global Front End, where it adds a couple of stages to VISC’s 11-stage pipeline. All translation is done automatically; there’s no special compiler, preprocessor, or emulator code required. This is unlike what Apple and other computer companies did in the 1990s when they translated binary code on the fly, and more akin to Transmeta, Intel, or AMD. The x86 instruction set is notoriously baroque and intricate, so breaking it down into more-digestible micro-operations made sense for the x86 vendors. But ARM’s instruction set is pretty straightforward, as these things go. Whatever VISC’s internal instruction set looks like (the company isn’t saying), it’s probably not a tough job to convert from one to the other.

The hard part is beating off the lawyers. Companies like PicoTurbo as well as enterprising university students have all successfully cloned ARM processors before, but they were subsequently litigated out of existence. Soft Machines plans to skirt that legal minefield by licensing VISC only to companies that already have an ARM license. Transmeta, and even AMD, have used a similar approach by sticking to foundries covered under Intel’s patent agreements, so there is some legal precedent for the strategy. But that begs the question, why would I license VISC if I’ve already licensed ARM?

Probably for the performance. In dozens of benchmark tests, VISC handily outperformed an ARM Cortex-15 in every category – often by more than 2x or 3x. That’s double or triple the performance of ARM’s best 32-bit processor, running unmodified ARM binaries. No mean feat, that.

Granted, the benchmarks were run by Soft Machines, but they include such reliable old standbys as SPECint, SPECfp, EEMBC, and other fairly tamper-proof tests of processing prowess. The absolute worst that VISC did was 1.1x the performance of an A15 (i.e., 10% faster). In the best case, it was nearly 7x faster. The miserable Dhrystone benchmark came in at over 4.5x the speed of ARM, and the un-weighted average of all the benchmarks hovers around 3x ARM’s average performance. Not bad; not bad at all.

Now for the asterisks. All of these scores are measured in units of performance per MHz, not absolute performance. VISC might be faster than Cortex-A15 in terms of instructions per clock (IPC), but if you’re running your current chip above about 500 MHz, it’s probably faster than VISC.

VISC may be considerably faster than ARM, or even x86, in terms of “architectural efficiency,” but it has a shorter pipeline than most leading-edge processors, and short pipelines hobble clock rates. Soft Machines isn’t saying how fast VISC will run in real life, but 500 MHz is a good guess, assuming a leading-edge process. That compares to 2.5 GHz for ARM’s Cortex-A15 and as much as 4 GHz for Intel’s 22nm Core i7 (Haswell) chips. If absolute performance is what you crave, VISC probably isn’t going to get you there.

ARM and Intel aren’t stupid; they know that short pipelines are simpler, use less silicon, and consume less power, but that longer pipelines enable faster clock frequencies. For today’s high-end embedded applications, that’s the right decision to make, and it’s why Cortex-A15 licenses are flying off the figurative shelves. Soft Machines chose simplicity (relatively speaking) over absolute performance, at least for now.

But that shouldn’t diminish Soft Machines’ accomplishments, and they are many. Even allowing for a modest “fudge factor” on the benchmarks, VISC soundly trounces its presumed archrival, ARM’s Cortex-A15. Double or triple the A15’s performance at the same clock rate would make any chip designer sit up and take notice.

And that’s not even counting VISC’s expected power savings. Using SPECint as a yardstick, Soft Machines says a two-core VISC can deliver the same performance as A15 while using just one-third the power. Conversely, a four-core VISC can crank out the roughly double the performance at the same power level. Regardless the where the actual numbers fall, the point is clear: VISC is on a shallower performance-per-watt curve than even the vaunted ARM family. It looks like someone has beat ARM at its own game, at least on paper.

So why would Soft Machines paint a target on itself by competing directly with ARM with an ARM-compatible processor? To paraphrase Willie Sutton, that’s where the customers are.

Inventing a new microprocessor is hard enough, but the really hard part is developing software for it. As many large chipmakers discovered to their detriment (Intel, AMD, Freescale, Texas instruments, IBM, DEC, et al), it takes a minor miracle for a new processor to develop enough market momentum to become viable. Without a critical mass of software, including compilers, operating systems, middleware, applications, and much more, a processor is just an interesting circuit-design experiment. So by adopting (usurping, even) ARM’s established software base, Soft Machines made its task a whole lot easier. “We can’t give performance with one hand and take it away with the other,” says the company’s CTO. All of that clever circuit design might have gone to waste if they’d given VISC an entirely new instruction set. Better to go with the flow.

As we said last week, you don’t see entirely new microprocessor companies every day. But this one might just be worth watching. 

2 thoughts on “VISC Processor Secrets Revealed”

  1. Not, and here’s why not:
    1) The focus is on computation intensive algorithms rather than general purpose.
    2) Excessive memory accesses due to speculative execution and RISC ISA.
    3) Any new cpu should run at HLL statement abstraction level rather than an ISA.

Leave a Reply

featured blogs
Jul 6, 2022
With the DRAM fabrication advancing from 1x to 1y to 1z and further to 1a, 1b and 1c nodes along with the DRAM device speeds going up to 8533 for Lpddr5/8800 for DDR5, Data integrity is becoming a... ...
Jul 6, 2022
Design Automation Conference (DAC) 2022 is almost here! Explore EDA and cloud design tools, autonomous systems, AI, and more with our experts in San Francisco. The post DAC 2022: A Glimpse into the World of Design Automation from the Cloud to Cryogenic Computing appeared fir...
Jun 28, 2022
Watching this video caused me to wander off into the weeds looking at a weird and wonderful collection of wheeled implementations....

featured video

Multi-Vendor Extra Long Reach 112G SerDes Interoperability Between Synopsys and AMD

Sponsored by Synopsys

This OFC 2022 demo features Synopsys 112G Ethernet IP interoperating with AMD's 112G FPGA and 2.5m DAC, showcasing best TX and RX performance with auto negotiation and link training.

Learn More

featured paper

3 key considerations for your next-generation HMI design

Sponsored by Texas Instruments

Human-Machine Interface (HMI) designs are evolving. Learn about three key design considerations for next-generation HMI and find out how low-cost edge AI, power-efficient processing and advanced display capabilities are paving the way for new human-machine interfaces that are smart, easily deployable, and interactive.

Click to read more

featured chalk talk

The Composite Power Inductance Story

Sponsored by Mouser Electronics and Vishay

Power inductor technology has made a huge difference in the evolution of our electronic system designs. In this episode of Chalk Talk, Amelia Dalton chats with Tim Shafer from Vishay about the history of power inductor technology, how Vishay developed the most compact and efficient power inductor on the market today and why Vishay’s extensive portfolio of composite power inductors might be the best solution for your next embedded system design.

Click here for more information about Vishay Inductors