feature article
Subscribe Now

GAP9 for ML at the Edge

GreenWaves GAP9 CPU is a Big Update Over its Predecessor

“History doesn’t repeat itself, but it often rhymes.” – Mark Twain

When we last checked up on GreenWaves, the French company had just launched its very first chip, the GAP8 processor. They’ve kept busy in the intervening two years by working on its successor, the GAP9. And it’s a big update. 

The concept is still the same. Build a low-cost, low-power processor for ML inference tasks inside IoT and wearable devices. Keep the power down so that dumb devices can be made less dumb but still run on batteries. Keep the ML performance up so data doesn’t have to be transferred to a remote cloud server or local hub. Keep the price reasonable and everybody will want one. Check, check, and check. 

The new GAP9 is clearly the GAP8’s big brother. They share a strong family resemblance, with the same eight RISC-V processor cores, plus a ninth one as overseer. Internal memory size has tripled. Fab process jumped two generations ahead. There are faster and more capable I/O interfaces. Plus a couple of new tricks learned from watching GAP8. Overall, GreenWaves says GAP9 can handle 10x bigger problems than GAP8, yet it consumes just one-fifth the power. Sounds like an upgrade to me. 

Oddly, GAP9 doesn’t appear anywhere on GreenWaves’s website, apart from one small press announcement on its debut date. That might be because the part isn’t shipping yet – that’s expected later this summer – and the company doesn’t want to Osborne itself

GreenWaves is a believer in using standard CPUs (or at least, semi-standard) to do ML inference, rather than specialized hardware. “The market moves too quickly,” says Martin Croome, the company’s VP of Marketing, “and [customers] want to exploit fast changes in the state of the art.” In short, software is easier to update than custom hardware. 

GAP9 endows its eight identical RISC-V cores with custom ML acceleration, but not so much that you can’t program them normally. That’s a carryover decision from the GAP8, and the company is happy to stick with that strategy. Each CPU core has its own private instruction cache, but no data cache. Traditional data caches don’t work all that well in ML applications because of the way data streams through. 

All eight cores (nine, counting the housekeeping processor) do share a big 128KB block of SRAM, and there’s an unusual 1.5MB block of interleaved memory intended for coefficients combined with a 128KB block intended for code. The chip is also able to map external memory into its internal store, essentially caching or shadowing off-chip memory for faster access. 

Data flow is also improved by a pair of programmable DMAs and a tool that GreenWaves calls its AutoTiler. In most inference code, it’s possible for the compiler to determine from the graph description what the data traffic will look like. That means the compiler can pre-plan its data movement. AutoTiler then programs the two DMAs so that they collaborate. One moves data from external RAM or ROM into the chip’s large L2 buffer, while the other transfers from the buffer to shared L1. By keeping these transactions coordinated with compiled code, GAP9 can (theoretically) plow through loops without waiting for slow external memory. 

Another big jump came in GAP9’s fabrication technology. The earlier GAP8 was made in TSMC’s 55nm LP process, while GAP9 uses a comparatively advanced 22nm FDX (fully depleted silicon-on-insulator) from GlobalFoundries. The generational leap in process technology allows GAP9 to run at a whizzy 400 MHz, compared to its 250-MHz predecessor. 

It just wouldn’t be a new processor announcement without benchmarks, and GreenWaves has delivered. Given GAP9’s preproduction status, however, all scores are estimates, simulations, and educated guesses. Compared to STMicroelectronics’s family of STM32H7xx devices running at the same 400-MHz clock frequency, GAP9 runs the proverbial circles around its fellow French competitor. At least, it does on simulated MobileNetv1 benchmarks

GreenWaves says GAP9 can process 160×160-pixel images 14× faster than the ST parts, ripping through 83.9 frames/sec, versus 6.2 frames/sec, with the same 43% accuracy. Conversely, GAP9 can deliver the same frame rate as the ST part while running at just 29 MHz. Or, if what you want is maximum accuracy, GAP9 can do 192×192, 6 fps, with 70% accuracy. In all three cases, GreenWaves says its device consumes two-thirds less (simulated) power than the ST device, down to 97% less in the best case. 

We’re a few months away from knowing exactly how GAP9 will work, behave, and perform. But its predecessor, GAP8, is real and it seems to be doing the job for its customers, so the architecture is sound. GAP9 doesn’t require a leap of faith on a new family; it’s more of a straightforward upgrade. Maybe we’re seeing the start of an IoT dynasty. 

One thought on “GAP9 for ML at the Edge”

Leave a Reply

featured blogs
May 24, 2024
Could these creepy crawly robo-critters be the first step on a slippery road to a robot uprising coupled with an insect uprising?...
May 23, 2024
We're investing in semiconductor workforce development programs in Latin America, including government and academic partnerships to foster engineering talent.The post Building the Semiconductor Workforce in Latin America appeared first on Chip Design....

featured video

Why Wiwynn Energy-Optimized Data Center IT Solutions Use Cadence Optimality Explorer

Sponsored by Cadence Design Systems

In the AI era, as the signal-data rate increases, the signal integrity challenges in server designs also increase. Wiwynn provides hyperscale data centers with innovative cloud IT infrastructure, bringing the best total cost of ownership (TCO), energy, and energy-itemized IT solutions from the cloud to the edge.

Learn more about how Wiwynn is developing a new methodology for PCB designs with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver.

featured paper

Achieve Greater Design Flexibility and Reduce Costs with Chiplets

Sponsored by Keysight

Chiplets are a new way to build a system-on-chips (SoCs) to improve yields and reduce costs. It partitions the chip into discrete elements and connects them with a standardized interface, enabling designers to meet performance, efficiency, power, size, and cost challenges in the 5 / 6G, artificial intelligence (AI), and virtual reality (VR) era. This white paper will discuss the shift to chiplet adoption and Keysight EDA's implementation of the communication standard (UCIe) into the Keysight Advanced Design System (ADS).

Dive into the technical details – download now.

featured chalk talk

Audio Design for Augmented and Virtual Reality (AR/VR) Glasses
Open ear audio can be beneficial to a host of different applications including virtual reality headsets, smart glasses, and sports and fitness designs. In this episode of Chalk Talk, Amelia Dalton and Ryan Boyle from Analog Devices explore the what, where, and how of open ear audio. We also investigate the solutions that Analog Devices has for open ear audio applications and how you can design open ear audio into your next application. 
Jan 23, 2024
16,977 views