feature article
Subscribe Now

GAP9 for ML at the Edge

GreenWaves GAP9 CPU is a Big Update Over its Predecessor

“History doesn’t repeat itself, but it often rhymes.” – Mark Twain

When we last checked up on GreenWaves, the French company had just launched its very first chip, the GAP8 processor. They’ve kept busy in the intervening two years by working on its successor, the GAP9. And it’s a big update. 

The concept is still the same. Build a low-cost, low-power processor for ML inference tasks inside IoT and wearable devices. Keep the power down so that dumb devices can be made less dumb but still run on batteries. Keep the ML performance up so data doesn’t have to be transferred to a remote cloud server or local hub. Keep the price reasonable and everybody will want one. Check, check, and check. 

The new GAP9 is clearly the GAP8’s big brother. They share a strong family resemblance, with the same eight RISC-V processor cores, plus a ninth one as overseer. Internal memory size has tripled. Fab process jumped two generations ahead. There are faster and more capable I/O interfaces. Plus a couple of new tricks learned from watching GAP8. Overall, GreenWaves says GAP9 can handle 10x bigger problems than GAP8, yet it consumes just one-fifth the power. Sounds like an upgrade to me. 

Oddly, GAP9 doesn’t appear anywhere on GreenWaves’s website, apart from one small press announcement on its debut date. That might be because the part isn’t shipping yet – that’s expected later this summer – and the company doesn’t want to Osborne itself

GreenWaves is a believer in using standard CPUs (or at least, semi-standard) to do ML inference, rather than specialized hardware. “The market moves too quickly,” says Martin Croome, the company’s VP of Marketing, “and [customers] want to exploit fast changes in the state of the art.” In short, software is easier to update than custom hardware. 

GAP9 endows its eight identical RISC-V cores with custom ML acceleration, but not so much that you can’t program them normally. That’s a carryover decision from the GAP8, and the company is happy to stick with that strategy. Each CPU core has its own private instruction cache, but no data cache. Traditional data caches don’t work all that well in ML applications because of the way data streams through. 

All eight cores (nine, counting the housekeeping processor) do share a big 128KB block of SRAM, and there’s an unusual 1.5MB block of interleaved memory intended for coefficients combined with a 128KB block intended for code. The chip is also able to map external memory into its internal store, essentially caching or shadowing off-chip memory for faster access. 

Data flow is also improved by a pair of programmable DMAs and a tool that GreenWaves calls its AutoTiler. In most inference code, it’s possible for the compiler to determine from the graph description what the data traffic will look like. That means the compiler can pre-plan its data movement. AutoTiler then programs the two DMAs so that they collaborate. One moves data from external RAM or ROM into the chip’s large L2 buffer, while the other transfers from the buffer to shared L1. By keeping these transactions coordinated with compiled code, GAP9 can (theoretically) plow through loops without waiting for slow external memory. 

Another big jump came in GAP9’s fabrication technology. The earlier GAP8 was made in TSMC’s 55nm LP process, while GAP9 uses a comparatively advanced 22nm FDX (fully depleted silicon-on-insulator) from GlobalFoundries. The generational leap in process technology allows GAP9 to run at a whizzy 400 MHz, compared to its 250-MHz predecessor. 

It just wouldn’t be a new processor announcement without benchmarks, and GreenWaves has delivered. Given GAP9’s preproduction status, however, all scores are estimates, simulations, and educated guesses. Compared to STMicroelectronics’s family of STM32H7xx devices running at the same 400-MHz clock frequency, GAP9 runs the proverbial circles around its fellow French competitor. At least, it does on simulated MobileNetv1 benchmarks

GreenWaves says GAP9 can process 160×160-pixel images 14× faster than the ST parts, ripping through 83.9 frames/sec, versus 6.2 frames/sec, with the same 43% accuracy. Conversely, GAP9 can deliver the same frame rate as the ST part while running at just 29 MHz. Or, if what you want is maximum accuracy, GAP9 can do 192×192, 6 fps, with 70% accuracy. In all three cases, GreenWaves says its device consumes two-thirds less (simulated) power than the ST device, down to 97% less in the best case. 

We’re a few months away from knowing exactly how GAP9 will work, behave, and perform. But its predecessor, GAP8, is real and it seems to be doing the job for its customers, so the architecture is sound. GAP9 doesn’t require a leap of faith on a new family; it’s more of a straightforward upgrade. Maybe we’re seeing the start of an IoT dynasty. 

One thought on “GAP9 for ML at the Edge”

Leave a Reply

featured blogs
Sep 21, 2020
Technology is changing the strategies we use to do things - oh so fast that 2010 seems like a distant past- within many spaces -- including the way we do our current topic of interest - Timing... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Sep 21, 2020
Semicon, the world’s largest semiconductor conference and exhibition, is September 23-25 in Taiwan. Like most shows of its size and caliber, Semicon boasts a long and illustrious list of exhibitors (500+), and countless forums, symposiums, and workshops. Of course Semic...
Sep 18, 2020
[From the last episode: We put the various pieces of a memory together to show the whole thing.] Before we finally turn our memory discussion into an AI discussion, let'€™s take on one annoying little detail that I'€™ve referred to a few times, but have kept putting off. ...
Sep 16, 2020
In addition to the Great Highland (Scottish) bagpipes, the Uilleann (Irish) bagpipes, and the Northumbrian (English) bagpipes, there are myriad other offerings spanning the globe....

Featured Video

Product Update: Family of DesignWare Ethernet IP for Time-Sensitive Networking

Sponsored by Synopsys

Hear John Swanson, our product expert, give an update on Synopsys’ DesignWare® Ethernet IP for Time-Sensitive Networking (TSN), which is compliant with IEEE standards and enables predictable guaranteed latency in automotive ADAS and industrial automation SoCs.

Click here for more information about DesignWare Ethernet Quality-of-Service Controller IP

Featured Paper

Helping physicians achieve faster, more accurate patient diagnoses with molecular test technology

Sponsored by Texas Instruments

Point-of-care molecular diagnostics (PoC) help physicians achieve faster, more accurate patient diagnoses and treatment decisions. This article breaks down how molecular test technology works and the building blocks for a PoC molecular diagnostics analyzer sensor front end system.

Read the Article

Featured Chalk Talk

AVX Supercapacitors: PrizmaCap

Sponsored by Mouser Electronics and AVX

If your application requires a supercapacitor, there are a lot of options. You need the right form factor, temperature range, weight, and capacitance, of course. In this episode of Chalk Talk, Amelia Dalton chats with Eric DeRose of AVX about choosing the right supercapacitor and about PrizmaCap - a new supercapacitor with low height, high temperature, and lightweight.

Click here for more information AVX PrizmaCap™