feature article
Subscribe Now

Quadric CPU Combines AI with Conventional Code

Parallel 256-core Machine Handles Both C and TensorFlow

Another day, another new microprocessor architecture. There was a time in the Nineties when everyone and his dog was designing a new processor. They were all going to revolutionize the world, crush Intel, enable new cutting-edge devices, and show us how it’s really supposed to be done. The nerd journals were filled with new CPU acronyms like RISC, VLIW, IPC, EPIC, ROB, ILP, SSE2, TLB, BTB, AES-NI, SIMD, and more. And, of course, IPO. 

Fortunately for us, those days of revolution are over. The majority of those new processor families faded away, the x86 retained its dominance (in some markets, anyway), and many of the innovations that defined the new upstarts were eventually absorbed into the incumbent CPUs. The mad rush to topple the CPU status quo has settled down. 

Well, almost. 

Some of today’s workloads aren’t like yesterday’s workloads. New problems require new solutions. AI and ML have changed the rules of the game. We need “…a unified hardware and software platform that can unlock the power of on-device AI for a wide range of applications at the network edge.” 

So saith Quadric, a 30-person Silicon Valley startup that’s tackling one of the biggest problems of all: getting an entirely new processor architecture off the ground. The company’s unique q16 chip is currently shipping, and its small M.2 development boards are available as well. So, it’s real. But is it really different? 

Quadric’s processor is hard to pigeonhole. It’s touted as an AI accelerator but also as a general-purpose processor. It bears some of the hallmarks of a VLIW machine as well as being nouveau RISC. Quadric calls it a “code-driven architecture” and says it’s “built from the ground up with software in mind.” 

By that, they mean they arrived at a set of problems first, then designed a CPU to solve those problems. Quadric’s founders are all experienced EEs and startup grognards, and they had been working on a machine-vision system that just didn’t have enough throughput. The solution then was to throw more hardware at the problem, but they decided there must be a better way… 

Current CPUs — even relatively modern ones — are designed to do anything, says Quadric co-founder and Chief Product Officer Daniel Firu. “But software has changed a lot [since those CPUs were created]. There are neural nets all over: recommendation engines, self-driving cars, robotics, cameras, etc. The only existing thing that comes close is a GPU. We set out to generalize the data-parallelism problem in compute- and power-constrained environments. We run algorithms that a neural net processor can run, or that an app builder or roboticist would run.” 

Quadric’s CPU designers set out to simplify, not complicate, their processor. They believe in offloading to software anything that doesn’t need to be done in hardware. The q16 chip has no caches, no branch prediction, no speculative execution, nor any of the frippery common to today’s CPUs. Instead, the compiler schedules load/store operations between the CPU and memory, as well as data transfers within the chip. Work is allocated among the 256 processor cores at compile time, with only minimal run-time tweaking. 

“What’s starkly different is the amount of hardware speculation and complex caching algorithms in current CPUs,” says Firu. “There’s all this hardware trying to figure out what the program might want to do. A general CPU can do anything, but if you constrain usage to fewer things, you can get rid of a lot of that stuff.” 

Sounds like RISC all over again. Eliminate hardware bottlenecks and push the responsibility onto the compiler. Firu doesn’t disagree, but he points out that compilers are a lot better now than they were 30 years ago. “LLVM allows us to leverage big-company compiler technology” without the big compiler company. 

Although q16 is designed to excel at parallel algorithms, it’s also a general-purpose CPU. It’s intended to replace both the host processor (x86, ARM, etc.) and the AI accelerator (Tensor, Tegra, GraphCore, etc.) in a system. It’s programmable in C, but it’s equally comfortable with PyTorch and TensorFlow. 

That said, Quadric isn’t out to unseat Intel, AMD, ARM, or nVidia. “We’re ambitious but pragmatic,” says Firu. “We fancy ourselves as a high-performance general-purpose architecture. It’s a bridge between ‘embarrassingly parallel’ and ‘single-threaded.’” 

With 256 identical cores arranged in a grid, the q16 chip has ample resources for parallel problems. Each core has its own local memory, plus single-cycle access to its neighbors’ local memories. Cores farther away can be accessed with a time penalty for each hop. Quadric’s compiler statically schedules all 256 cores, as well as their data transactions and interactions. Users can theoretically intervene in this process manually, though there’s little reason to ever do so. 

Each core is Turing-complete, meaning it’s a fully fledged computer that can run any possible program, not just a specialized accelerator. The ISA includes a hundred instructions or so, including logic functions, multiply-accumulate, and integer arithmetic. Quadric expects its customers will never program the chip in assembly language, relying instead on Quadric’s SDK, which, it says, ”…allows the developer to express graph-based and non-graph-based algorithms in unison.” 

Historically, the biggest problem with creating a new processor is not creating the processor — it’s creating the software ecosystem for it. There’s zero installed base of software, and potential users have zero experience using it. Firu says that’s less of a problem with q16 because customers’ unique IP isn’t tied up in C code anymore. It’s in ML graphs, and those are portable via PyTorch or TensorFlow. Second, Quadric has developed a C++ API and libraries to ease “normal” code development. 

Right now, Quadric is a fabless chip company selling development boards and silicon. But in the future, the company may make the switch to licensing its IP. Firu says that fully 60% of their current customers (a small sample size, admittedly) are interested in licensing the processor IP; the other 40% want silicon. “We may evolve into a pure IP play,” he says. If so, where does that leave chip customers? Will the q16 be an only child? 

Not a chance, says Quadric. Regardless of business model, the company will always produce  at least one chip per CPU generation as a “showpiece” demonstrator for the architecture. Like SiFive, Quadric may get most of its revenue from licensing, but with chips on the side. 

Processor innovation isn’t dead, it just took a pause. If Quadric is right and workloads have changed but the need for a do-it-all CPU hasn’t, the company may be at the start of something big. 

Note: This is my last article for Electronic Engineering Journal. After 15 years and 700 articles, it’s time for me to bow out and retire before anyone realizes I don’t know what I’m doing. My thanks go out to the entire crew at Techfocus Media, and to the readers of EEJ for keeping us honest and involved. It’s been a great ride.

One thought on “Quadric CPU Combines AI with Conventional Code”

Leave a Reply

featured blogs
Sep 16, 2021
I was quite happy with the static platform I'd created for my pseudo robot heads, and then some mad impetuous fool suggested servos. Oh no! Here we go again......
Sep 16, 2021
CadenceLIVE, Cadence's annual user conference, has been a great platform for Cadence technology users, developers, and industry experts to connect, share ideas and best practices solve design... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Sep 15, 2021
Learn how chiplets form the basis of multi-die HPC processor architectures, fueling modern HPC applications and scaling performance & power beyond Moore's Law. The post What's Driving the Demand for Chiplets? appeared first on From Silicon To Software....
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

ARC® Processor Virtual Summit 2021

Sponsored by Synopsys

Designing an embedded SoC? Attend the ARC Processor Virtual Summit on Sept 21-22 to get in-depth information from industry leaders on the latest ARC processor IP and related hardware and software technologies that enable you to achieve differentiation in your chip or system design.

Click to read more

featured paper

Detect. Sense. Control: Simplify building automation designs with MSP430™ MCU-based solutions

Sponsored by Texas Instruments

Building automation systems are critical not only to security, but worker comfort. Whether you need to detect, sense or control applications within your environment, the right MCU can make it easy. Using MSP430 MCUS with integrated analog, you can easily develop common building automation applications including motion detectors, touch keypads and e-locks, as well as video security cameras. Read more to see how you can enhance your building automation design.

Click to read more

featured chalk talk

TI Robotics System Learning Kit

Sponsored by Mouser Electronics and Texas Instruments

Robotics projects can get complicated quickly, and finding a set of components, controllers, networking, and software that plays nicely together is a real headache. In this episode of Chalk Talk, Amelia Dalton chats with Mark Easley of Texas Instruments about the TI-RSLK Robotics Kit, which will get you up and running on your next robotics project in no time.

Click here for more information about the Texas Instruments TIRSLK-EVM Robotics System Lab Kit