feature article
Subscribe Now

Quadric CPU Combines AI with Conventional Code

Parallel 256-core Machine Handles Both C and TensorFlow

Another day, another new microprocessor architecture. There was a time in the Nineties when everyone and his dog was designing a new processor. They were all going to revolutionize the world, crush Intel, enable new cutting-edge devices, and show us how it’s really supposed to be done. The nerd journals were filled with new CPU acronyms like RISC, VLIW, IPC, EPIC, ROB, ILP, SSE2, TLB, BTB, AES-NI, SIMD, and more. And, of course, IPO. 

Fortunately for us, those days of revolution are over. The majority of those new processor families faded away, the x86 retained its dominance (in some markets, anyway), and many of the innovations that defined the new upstarts were eventually absorbed into the incumbent CPUs. The mad rush to topple the CPU status quo has settled down. 

Well, almost. 

Some of today’s workloads aren’t like yesterday’s workloads. New problems require new solutions. AI and ML have changed the rules of the game. We need “…a unified hardware and software platform that can unlock the power of on-device AI for a wide range of applications at the network edge.” 

So saith Quadric, a 30-person Silicon Valley startup that’s tackling one of the biggest problems of all: getting an entirely new processor architecture off the ground. The company’s unique q16 chip is currently shipping, and its small M.2 development boards are available as well. So, it’s real. But is it really different? 

Quadric’s processor is hard to pigeonhole. It’s touted as an AI accelerator but also as a general-purpose processor. It bears some of the hallmarks of a VLIW machine as well as being nouveau RISC. Quadric calls it a “code-driven architecture” and says it’s “built from the ground up with software in mind.” 

By that, they mean they arrived at a set of problems first, then designed a CPU to solve those problems. Quadric’s founders are all experienced EEs and startup grognards, and they had been working on a machine-vision system that just didn’t have enough throughput. The solution then was to throw more hardware at the problem, but they decided there must be a better way… 

Current CPUs — even relatively modern ones — are designed to do anything, says Quadric co-founder and Chief Product Officer Daniel Firu. “But software has changed a lot [since those CPUs were created]. There are neural nets all over: recommendation engines, self-driving cars, robotics, cameras, etc. The only existing thing that comes close is a GPU. We set out to generalize the data-parallelism problem in compute- and power-constrained environments. We run algorithms that a neural net processor can run, or that an app builder or roboticist would run.” 

Quadric’s CPU designers set out to simplify, not complicate, their processor. They believe in offloading to software anything that doesn’t need to be done in hardware. The q16 chip has no caches, no branch prediction, no speculative execution, nor any of the frippery common to today’s CPUs. Instead, the compiler schedules load/store operations between the CPU and memory, as well as data transfers within the chip. Work is allocated among the 256 processor cores at compile time, with only minimal run-time tweaking. 

“What’s starkly different is the amount of hardware speculation and complex caching algorithms in current CPUs,” says Firu. “There’s all this hardware trying to figure out what the program might want to do. A general CPU can do anything, but if you constrain usage to fewer things, you can get rid of a lot of that stuff.” 

Sounds like RISC all over again. Eliminate hardware bottlenecks and push the responsibility onto the compiler. Firu doesn’t disagree, but he points out that compilers are a lot better now than they were 30 years ago. “LLVM allows us to leverage big-company compiler technology” without the big compiler company. 

Although q16 is designed to excel at parallel algorithms, it’s also a general-purpose CPU. It’s intended to replace both the host processor (x86, ARM, etc.) and the AI accelerator (Tensor, Tegra, GraphCore, etc.) in a system. It’s programmable in C, but it’s equally comfortable with PyTorch and TensorFlow. 

That said, Quadric isn’t out to unseat Intel, AMD, ARM, or nVidia. “We’re ambitious but pragmatic,” says Firu. “We fancy ourselves as a high-performance general-purpose architecture. It’s a bridge between ‘embarrassingly parallel’ and ‘single-threaded.’” 

With 256 identical cores arranged in a grid, the q16 chip has ample resources for parallel problems. Each core has its own local memory, plus single-cycle access to its neighbors’ local memories. Cores farther away can be accessed with a time penalty for each hop. Quadric’s compiler statically schedules all 256 cores, as well as their data transactions and interactions. Users can theoretically intervene in this process manually, though there’s little reason to ever do so. 

Each core is Turing-complete, meaning it’s a fully fledged computer that can run any possible program, not just a specialized accelerator. The ISA includes a hundred instructions or so, including logic functions, multiply-accumulate, and integer arithmetic. Quadric expects its customers will never program the chip in assembly language, relying instead on Quadric’s SDK, which, it says, ”…allows the developer to express graph-based and non-graph-based algorithms in unison.” 

Historically, the biggest problem with creating a new processor is not creating the processor — it’s creating the software ecosystem for it. There’s zero installed base of software, and potential users have zero experience using it. Firu says that’s less of a problem with q16 because customers’ unique IP isn’t tied up in C code anymore. It’s in ML graphs, and those are portable via PyTorch or TensorFlow. Second, Quadric has developed a C++ API and libraries to ease “normal” code development. 

Right now, Quadric is a fabless chip company selling development boards and silicon. But in the future, the company may make the switch to licensing its IP. Firu says that fully 60% of their current customers (a small sample size, admittedly) are interested in licensing the processor IP; the other 40% want silicon. “We may evolve into a pure IP play,” he says. If so, where does that leave chip customers? Will the q16 be an only child? 

Not a chance, says Quadric. Regardless of business model, the company will always produce  at least one chip per CPU generation as a “showpiece” demonstrator for the architecture. Like SiFive, Quadric may get most of its revenue from licensing, but with chips on the side. 

Processor innovation isn’t dead, it just took a pause. If Quadric is right and workloads have changed but the need for a do-it-all CPU hasn’t, the company may be at the start of something big. 

Note: This is my last article for Electronic Engineering Journal. After 15 years and 700 articles, it’s time for me to bow out and retire before anyone realizes I don’t know what I’m doing. My thanks go out to the entire crew at Techfocus Media, and to the readers of EEJ for keeping us honest and involved. It’s been a great ride.

One thought on “Quadric CPU Combines AI with Conventional Code”

Leave a Reply

featured blogs
May 8, 2024
Learn how artificial intelligence of things (AIoT) applications at the edge rely on TSMC's N12e manufacturing processes and specialized semiconductor IP.The post How Synopsys IP and TSMC’s N12e Process are Driving AIoT appeared first on Chip Design....
May 2, 2024
I'm envisioning what one of these pieces would look like on the wall of my office. It would look awesome!...

featured video

Why Wiwynn Energy-Optimized Data Center IT Solutions Use Cadence Optimality Explorer

Sponsored by Cadence Design Systems

In the AI era, as the signal-data rate increases, the signal integrity challenges in server designs also increase. Wiwynn provides hyperscale data centers with innovative cloud IT infrastructure, bringing the best total cost of ownership (TCO), energy, and energy-itemized IT solutions from the cloud to the edge.

Learn more about how Wiwynn is developing a new methodology for PCB designs with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver.

featured paper

Altera® FPGAs and SoCs with FPGA AI Suite and OpenVINO™ Toolkit Drive Embedded/Edge AI/Machine Learning Applications

Sponsored by Intel

Describes the emerging use cases of FPGA-based AI inference in edge and custom AI applications, and software and hardware solutions for edge FPGA AI.

Click here to read more

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023
21,336 views