feature article
Subscribe Now

Quadric CPU Combines AI with Conventional Code

Parallel 256-core Machine Handles Both C and TensorFlow

Another day, another new microprocessor architecture. There was a time in the Nineties when everyone and his dog was designing a new processor. They were all going to revolutionize the world, crush Intel, enable new cutting-edge devices, and show us how it’s really supposed to be done. The nerd journals were filled with new CPU acronyms like RISC, VLIW, IPC, EPIC, ROB, ILP, SSE2, TLB, BTB, AES-NI, SIMD, and more. And, of course, IPO. 

Fortunately for us, those days of revolution are over. The majority of those new processor families faded away, the x86 retained its dominance (in some markets, anyway), and many of the innovations that defined the new upstarts were eventually absorbed into the incumbent CPUs. The mad rush to topple the CPU status quo has settled down. 

Well, almost. 

Some of today’s workloads aren’t like yesterday’s workloads. New problems require new solutions. AI and ML have changed the rules of the game. We need “…a unified hardware and software platform that can unlock the power of on-device AI for a wide range of applications at the network edge.” 

So saith Quadric, a 30-person Silicon Valley startup that’s tackling one of the biggest problems of all: getting an entirely new processor architecture off the ground. The company’s unique q16 chip is currently shipping, and its small M.2 development boards are available as well. So, it’s real. But is it really different? 

Quadric’s processor is hard to pigeonhole. It’s touted as an AI accelerator but also as a general-purpose processor. It bears some of the hallmarks of a VLIW machine as well as being nouveau RISC. Quadric calls it a “code-driven architecture” and says it’s “built from the ground up with software in mind.” 

By that, they mean they arrived at a set of problems first, then designed a CPU to solve those problems. Quadric’s founders are all experienced EEs and startup grognards, and they had been working on a machine-vision system that just didn’t have enough throughput. The solution then was to throw more hardware at the problem, but they decided there must be a better way… 

Current CPUs — even relatively modern ones — are designed to do anything, says Quadric co-founder and Chief Product Officer Daniel Firu. “But software has changed a lot [since those CPUs were created]. There are neural nets all over: recommendation engines, self-driving cars, robotics, cameras, etc. The only existing thing that comes close is a GPU. We set out to generalize the data-parallelism problem in compute- and power-constrained environments. We run algorithms that a neural net processor can run, or that an app builder or roboticist would run.” 

Quadric’s CPU designers set out to simplify, not complicate, their processor. They believe in offloading to software anything that doesn’t need to be done in hardware. The q16 chip has no caches, no branch prediction, no speculative execution, nor any of the frippery common to today’s CPUs. Instead, the compiler schedules load/store operations between the CPU and memory, as well as data transfers within the chip. Work is allocated among the 256 processor cores at compile time, with only minimal run-time tweaking. 

“What’s starkly different is the amount of hardware speculation and complex caching algorithms in current CPUs,” says Firu. “There’s all this hardware trying to figure out what the program might want to do. A general CPU can do anything, but if you constrain usage to fewer things, you can get rid of a lot of that stuff.” 

Sounds like RISC all over again. Eliminate hardware bottlenecks and push the responsibility onto the compiler. Firu doesn’t disagree, but he points out that compilers are a lot better now than they were 30 years ago. “LLVM allows us to leverage big-company compiler technology” without the big compiler company. 

Although q16 is designed to excel at parallel algorithms, it’s also a general-purpose CPU. It’s intended to replace both the host processor (x86, ARM, etc.) and the AI accelerator (Tensor, Tegra, GraphCore, etc.) in a system. It’s programmable in C, but it’s equally comfortable with PyTorch and TensorFlow. 

That said, Quadric isn’t out to unseat Intel, AMD, ARM, or nVidia. “We’re ambitious but pragmatic,” says Firu. “We fancy ourselves as a high-performance general-purpose architecture. It’s a bridge between ‘embarrassingly parallel’ and ‘single-threaded.’” 

With 256 identical cores arranged in a grid, the q16 chip has ample resources for parallel problems. Each core has its own local memory, plus single-cycle access to its neighbors’ local memories. Cores farther away can be accessed with a time penalty for each hop. Quadric’s compiler statically schedules all 256 cores, as well as their data transactions and interactions. Users can theoretically intervene in this process manually, though there’s little reason to ever do so. 

Each core is Turing-complete, meaning it’s a fully fledged computer that can run any possible program, not just a specialized accelerator. The ISA includes a hundred instructions or so, including logic functions, multiply-accumulate, and integer arithmetic. Quadric expects its customers will never program the chip in assembly language, relying instead on Quadric’s SDK, which, it says, ”…allows the developer to express graph-based and non-graph-based algorithms in unison.” 

Historically, the biggest problem with creating a new processor is not creating the processor — it’s creating the software ecosystem for it. There’s zero installed base of software, and potential users have zero experience using it. Firu says that’s less of a problem with q16 because customers’ unique IP isn’t tied up in C code anymore. It’s in ML graphs, and those are portable via PyTorch or TensorFlow. Second, Quadric has developed a C++ API and libraries to ease “normal” code development. 

Right now, Quadric is a fabless chip company selling development boards and silicon. But in the future, the company may make the switch to licensing its IP. Firu says that fully 60% of their current customers (a small sample size, admittedly) are interested in licensing the processor IP; the other 40% want silicon. “We may evolve into a pure IP play,” he says. If so, where does that leave chip customers? Will the q16 be an only child? 

Not a chance, says Quadric. Regardless of business model, the company will always produce  at least one chip per CPU generation as a “showpiece” demonstrator for the architecture. Like SiFive, Quadric may get most of its revenue from licensing, but with chips on the side. 

Processor innovation isn’t dead, it just took a pause. If Quadric is right and workloads have changed but the need for a do-it-all CPU hasn’t, the company may be at the start of something big. 

Note: This is my last article for Electronic Engineering Journal. After 15 years and 700 articles, it’s time for me to bow out and retire before anyone realizes I don’t know what I’m doing. My thanks go out to the entire crew at Techfocus Media, and to the readers of EEJ for keeping us honest and involved. It’s been a great ride.

One thought on “Quadric CPU Combines AI with Conventional Code”

Leave a Reply

featured blogs
Sep 11, 2024
In which we cogitate, ruminate, and pontificate on the things you can do to further your goal of landing (and keeping) a job in engineering...

featured paper

A game-changer for IP designers: design-stage verification

Sponsored by Siemens Digital Industries Software

In this new technical paper, you’ll gain valuable insights into how, by moving physical verification earlier in the IP design flow, you can locate and correct design errors sooner, reducing costs and getting complex designs to market faster. Dive into the challenges of hard, soft and custom IP creation, and learn how to run targeted, real-time or on-demand physical verification with precision, earlier in the layout process.

Read more

featured chalk talk

Versatile S32G3 Processors for Automotive and Beyond
In this episode of Chalk Talk, Amelia Dalton and Brian Carlson from NXP investigate NXP’s S32G3 vehicle network processors that combine ASIL D safety, hardware security, high-performance real-time and application processing and network acceleration. They explore how these processors support many vehicle needs simultaneously, the specific benefits they bring to autonomous drive and ADAS applications, and how you can get started developing with these processors today.
Jul 24, 2024
54,725 views