feature article
Subscribe Now

GAP9 for ML at the Edge

GreenWaves GAP9 CPU is a Big Update Over its Predecessor

“History doesn’t repeat itself, but it often rhymes.” – Mark Twain

When we last checked up on GreenWaves, the French company had just launched its very first chip, the GAP8 processor. They’ve kept busy in the intervening two years by working on its successor, the GAP9. And it’s a big update. 

The concept is still the same. Build a low-cost, low-power processor for ML inference tasks inside IoT and wearable devices. Keep the power down so that dumb devices can be made less dumb but still run on batteries. Keep the ML performance up so data doesn’t have to be transferred to a remote cloud server or local hub. Keep the price reasonable and everybody will want one. Check, check, and check. 

The new GAP9 is clearly the GAP8’s big brother. They share a strong family resemblance, with the same eight RISC-V processor cores, plus a ninth one as overseer. Internal memory size has tripled. Fab process jumped two generations ahead. There are faster and more capable I/O interfaces. Plus a couple of new tricks learned from watching GAP8. Overall, GreenWaves says GAP9 can handle 10x bigger problems than GAP8, yet it consumes just one-fifth the power. Sounds like an upgrade to me. 

Oddly, GAP9 doesn’t appear anywhere on GreenWaves’s website, apart from one small press announcement on its debut date. That might be because the part isn’t shipping yet – that’s expected later this summer – and the company doesn’t want to Osborne itself

GreenWaves is a believer in using standard CPUs (or at least, semi-standard) to do ML inference, rather than specialized hardware. “The market moves too quickly,” says Martin Croome, the company’s VP of Marketing, “and [customers] want to exploit fast changes in the state of the art.” In short, software is easier to update than custom hardware. 

GAP9 endows its eight identical RISC-V cores with custom ML acceleration, but not so much that you can’t program them normally. That’s a carryover decision from the GAP8, and the company is happy to stick with that strategy. Each CPU core has its own private instruction cache, but no data cache. Traditional data caches don’t work all that well in ML applications because of the way data streams through. 

All eight cores (nine, counting the housekeeping processor) do share a big 128KB block of SRAM, and there’s an unusual 1.5MB block of interleaved memory intended for coefficients combined with a 128KB block intended for code. The chip is also able to map external memory into its internal store, essentially caching or shadowing off-chip memory for faster access. 

Data flow is also improved by a pair of programmable DMAs and a tool that GreenWaves calls its AutoTiler. In most inference code, it’s possible for the compiler to determine from the graph description what the data traffic will look like. That means the compiler can pre-plan its data movement. AutoTiler then programs the two DMAs so that they collaborate. One moves data from external RAM or ROM into the chip’s large L2 buffer, while the other transfers from the buffer to shared L1. By keeping these transactions coordinated with compiled code, GAP9 can (theoretically) plow through loops without waiting for slow external memory. 

Another big jump came in GAP9’s fabrication technology. The earlier GAP8 was made in TSMC’s 55nm LP process, while GAP9 uses a comparatively advanced 22nm FDX (fully depleted silicon-on-insulator) from GlobalFoundries. The generational leap in process technology allows GAP9 to run at a whizzy 400 MHz, compared to its 250-MHz predecessor. 

It just wouldn’t be a new processor announcement without benchmarks, and GreenWaves has delivered. Given GAP9’s preproduction status, however, all scores are estimates, simulations, and educated guesses. Compared to STMicroelectronics’s family of STM32H7xx devices running at the same 400-MHz clock frequency, GAP9 runs the proverbial circles around its fellow French competitor. At least, it does on simulated MobileNetv1 benchmarks

GreenWaves says GAP9 can process 160×160-pixel images 14× faster than the ST parts, ripping through 83.9 frames/sec, versus 6.2 frames/sec, with the same 43% accuracy. Conversely, GAP9 can deliver the same frame rate as the ST part while running at just 29 MHz. Or, if what you want is maximum accuracy, GAP9 can do 192×192, 6 fps, with 70% accuracy. In all three cases, GreenWaves says its device consumes two-thirds less (simulated) power than the ST device, down to 97% less in the best case. 

We’re a few months away from knowing exactly how GAP9 will work, behave, and perform. But its predecessor, GAP8, is real and it seems to be doing the job for its customers, so the architecture is sound. GAP9 doesn’t require a leap of faith on a new family; it’s more of a straightforward upgrade. Maybe we’re seeing the start of an IoT dynasty. 

One thought on “GAP9 for ML at the Edge”

Leave a Reply

featured blogs
Sep 22, 2021
3753 Cruithne is a Q-type, Aten asteroid in orbit around the Sun in 1:1 orbital resonance with the Earth, thereby making it a co-orbital object....
Sep 21, 2021
Placing component leads accurately as per the datasheet is an important task while creating a package footprint symbol. As the pin pitch goes down, the size and location of the component lead play a... [[ Click on the title to access the full blog on the Cadence Community si...
Sep 21, 2021
Learn how our high-performance FPGA prototyping tools enable RTL debug for chip validation teams, eliminating simulation/emulation during hardware debugging. The post High Debug Productivity Is the FPGA Prototyping Game Changer: Part 1 appeared first on From Silicon To Softw...
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

Digital Design Technology Symposium

Sponsored by Synopsys

Are you an SoC designer or manager facing new design challenges driven by rapidly growing and emerging vertical segments for HPC, 5G, mobile, automotive and AI applications?

Join us at the Digital Design Technology Symposium.

featured paper

Configure the charge and discharge current separately in a reversible buck/boost regulator

Sponsored by Maxim Integrated (now part of Analog Devices)

The design of a front-end converter can be made less complicated when minimal extra current overhead is required for charging the supercapacitor. This application note explains how to configure the reversible buck/boost converter to achieve a lighter impact on the system during the charging phase. Setting the charge current requirement to the minimum amount keeps the discharge current availability intact.

Click to read more

featured chalk talk

How Trinamic's Stepper Motor Technologies Improve Your Application

Sponsored by Mouser Electronics and Maxim Integrated (now part of Analog Devices)

Stepper motor control has come a long way in the past few years. New techniques can give greater control, smoother operation, greater torque, and better efficiency. In this episode of Chalk Talk, Amelia Dalton chats with Lars Jaskulski about Trinamic stepper solutions and how to take advantage of micro stepping, load measurement, and more.

Click here for more information about Trinamic TMCM-6110 6-Axis Stepper Motor Driver Board