feature article
Subscribe Now

GAP9 for ML at the Edge

GreenWaves GAP9 CPU is a Big Update Over its Predecessor

“History doesn’t repeat itself, but it often rhymes.” – Mark Twain

When we last checked up on GreenWaves, the French company had just launched its very first chip, the GAP8 processor. They’ve kept busy in the intervening two years by working on its successor, the GAP9. And it’s a big update. 

The concept is still the same. Build a low-cost, low-power processor for ML inference tasks inside IoT and wearable devices. Keep the power down so that dumb devices can be made less dumb but still run on batteries. Keep the ML performance up so data doesn’t have to be transferred to a remote cloud server or local hub. Keep the price reasonable and everybody will want one. Check, check, and check. 

The new GAP9 is clearly the GAP8’s big brother. They share a strong family resemblance, with the same eight RISC-V processor cores, plus a ninth one as overseer. Internal memory size has tripled. Fab process jumped two generations ahead. There are faster and more capable I/O interfaces. Plus a couple of new tricks learned from watching GAP8. Overall, GreenWaves says GAP9 can handle 10x bigger problems than GAP8, yet it consumes just one-fifth the power. Sounds like an upgrade to me. 

Oddly, GAP9 doesn’t appear anywhere on GreenWaves’s website, apart from one small press announcement on its debut date. That might be because the part isn’t shipping yet – that’s expected later this summer – and the company doesn’t want to Osborne itself

GreenWaves is a believer in using standard CPUs (or at least, semi-standard) to do ML inference, rather than specialized hardware. “The market moves too quickly,” says Martin Croome, the company’s VP of Marketing, “and [customers] want to exploit fast changes in the state of the art.” In short, software is easier to update than custom hardware. 

GAP9 endows its eight identical RISC-V cores with custom ML acceleration, but not so much that you can’t program them normally. That’s a carryover decision from the GAP8, and the company is happy to stick with that strategy. Each CPU core has its own private instruction cache, but no data cache. Traditional data caches don’t work all that well in ML applications because of the way data streams through. 

All eight cores (nine, counting the housekeeping processor) do share a big 128KB block of SRAM, and there’s an unusual 1.5MB block of interleaved memory intended for coefficients combined with a 128KB block intended for code. The chip is also able to map external memory into its internal store, essentially caching or shadowing off-chip memory for faster access. 

Data flow is also improved by a pair of programmable DMAs and a tool that GreenWaves calls its AutoTiler. In most inference code, it’s possible for the compiler to determine from the graph description what the data traffic will look like. That means the compiler can pre-plan its data movement. AutoTiler then programs the two DMAs so that they collaborate. One moves data from external RAM or ROM into the chip’s large L2 buffer, while the other transfers from the buffer to shared L1. By keeping these transactions coordinated with compiled code, GAP9 can (theoretically) plow through loops without waiting for slow external memory. 

Another big jump came in GAP9’s fabrication technology. The earlier GAP8 was made in TSMC’s 55nm LP process, while GAP9 uses a comparatively advanced 22nm FDX (fully depleted silicon-on-insulator) from GlobalFoundries. The generational leap in process technology allows GAP9 to run at a whizzy 400 MHz, compared to its 250-MHz predecessor. 

It just wouldn’t be a new processor announcement without benchmarks, and GreenWaves has delivered. Given GAP9’s preproduction status, however, all scores are estimates, simulations, and educated guesses. Compared to STMicroelectronics’s family of STM32H7xx devices running at the same 400-MHz clock frequency, GAP9 runs the proverbial circles around its fellow French competitor. At least, it does on simulated MobileNetv1 benchmarks

GreenWaves says GAP9 can process 160×160-pixel images 14× faster than the ST parts, ripping through 83.9 frames/sec, versus 6.2 frames/sec, with the same 43% accuracy. Conversely, GAP9 can deliver the same frame rate as the ST part while running at just 29 MHz. Or, if what you want is maximum accuracy, GAP9 can do 192×192, 6 fps, with 70% accuracy. In all three cases, GreenWaves says its device consumes two-thirds less (simulated) power than the ST device, down to 97% less in the best case. 

We’re a few months away from knowing exactly how GAP9 will work, behave, and perform. But its predecessor, GAP8, is real and it seems to be doing the job for its customers, so the architecture is sound. GAP9 doesn’t require a leap of faith on a new family; it’s more of a straightforward upgrade. Maybe we’re seeing the start of an IoT dynasty. 

One thought on “GAP9 for ML at the Edge”

Leave a Reply

featured blogs
Jul 3, 2020
[From the last episode: We looked at CNNs for vision as well as other neural networks for other applications.] We'€™re going to take a quick detour into math today. For those of you that have done advanced math, this may be a review, or it might even seem to be talking down...
Jul 2, 2020
Using the bitwise operators in general, and employing them to perform masking operations in particular, can be extremely efficacious....
Jul 2, 2020
In June, we continued to upgrade several key pieces of content across the website, including more interactive product explorers on several pages and a homepage refresh. We also made a significant update to our product pages which allows logged-in users to see customer-specifi...

Featured Video

Product Update: DesignWare® Foundation IP

Sponsored by Synopsys

Join Prasad Saggurti for an update on Synopsys’ DesignWare Foundation IP, including the world’s fastest TCAMs, widest-voltage GPIOs, I2C & I3C IOs, and LVDS IOs. Synopsys Foundation IP is silicon-proven in 7nm in more than 500,000 customer wafers, and 5nm is in development.

Click here for more information about DesignWare Foundation IP: Embedded Memories, Logic Libraries & GPIO

Featured Paper

Cryptography: Fundamentals on the Modern Approach

Sponsored by Maxim Integrated

Learn about the fundamental concepts behind modern cryptography, including how symmetric and asymmetric keys work to achieve confidentiality, identification and authentication, integrity, and non-repudiation.

Click here to download the whitepaper

Featured Chalk Talk

High-Performance Motor Control Solutions Through Integration

Sponsored by Mouser Electronics and Qorvo

Brushless motors have taken over the market for a huge number of applications these days. But, it’s easy to blow up your BOM cost with all the motor control and power management components required. In this episode of Chalk Talk, Amelia Dalton chats with Marc Sousa of Qorvo about the Power Application Controller (PAC) that can lower your BOM, trim down your component list, and give you several other benefits as well.

Click here for more information about Qorvo Power Application Controllers®