feature article
Subscribe Now

GAP9 for ML at the Edge

GreenWaves GAP9 CPU is a Big Update Over its Predecessor

“History doesn’t repeat itself, but it often rhymes.” – Mark Twain

When we last checked up on GreenWaves, the French company had just launched its very first chip, the GAP8 processor. They’ve kept busy in the intervening two years by working on its successor, the GAP9. And it’s a big update. 

The concept is still the same. Build a low-cost, low-power processor for ML inference tasks inside IoT and wearable devices. Keep the power down so that dumb devices can be made less dumb but still run on batteries. Keep the ML performance up so data doesn’t have to be transferred to a remote cloud server or local hub. Keep the price reasonable and everybody will want one. Check, check, and check. 

The new GAP9 is clearly the GAP8’s big brother. They share a strong family resemblance, with the same eight RISC-V processor cores, plus a ninth one as overseer. Internal memory size has tripled. Fab process jumped two generations ahead. There are faster and more capable I/O interfaces. Plus a couple of new tricks learned from watching GAP8. Overall, GreenWaves says GAP9 can handle 10x bigger problems than GAP8, yet it consumes just one-fifth the power. Sounds like an upgrade to me. 

Oddly, GAP9 doesn’t appear anywhere on GreenWaves’s website, apart from one small press announcement on its debut date. That might be because the part isn’t shipping yet – that’s expected later this summer – and the company doesn’t want to Osborne itself

GreenWaves is a believer in using standard CPUs (or at least, semi-standard) to do ML inference, rather than specialized hardware. “The market moves too quickly,” says Martin Croome, the company’s VP of Marketing, “and [customers] want to exploit fast changes in the state of the art.” In short, software is easier to update than custom hardware. 

GAP9 endows its eight identical RISC-V cores with custom ML acceleration, but not so much that you can’t program them normally. That’s a carryover decision from the GAP8, and the company is happy to stick with that strategy. Each CPU core has its own private instruction cache, but no data cache. Traditional data caches don’t work all that well in ML applications because of the way data streams through. 

All eight cores (nine, counting the housekeeping processor) do share a big 128KB block of SRAM, and there’s an unusual 1.5MB block of interleaved memory intended for coefficients combined with a 128KB block intended for code. The chip is also able to map external memory into its internal store, essentially caching or shadowing off-chip memory for faster access. 

Data flow is also improved by a pair of programmable DMAs and a tool that GreenWaves calls its AutoTiler. In most inference code, it’s possible for the compiler to determine from the graph description what the data traffic will look like. That means the compiler can pre-plan its data movement. AutoTiler then programs the two DMAs so that they collaborate. One moves data from external RAM or ROM into the chip’s large L2 buffer, while the other transfers from the buffer to shared L1. By keeping these transactions coordinated with compiled code, GAP9 can (theoretically) plow through loops without waiting for slow external memory. 

Another big jump came in GAP9’s fabrication technology. The earlier GAP8 was made in TSMC’s 55nm LP process, while GAP9 uses a comparatively advanced 22nm FDX (fully depleted silicon-on-insulator) from GlobalFoundries. The generational leap in process technology allows GAP9 to run at a whizzy 400 MHz, compared to its 250-MHz predecessor. 

It just wouldn’t be a new processor announcement without benchmarks, and GreenWaves has delivered. Given GAP9’s preproduction status, however, all scores are estimates, simulations, and educated guesses. Compared to STMicroelectronics’s family of STM32H7xx devices running at the same 400-MHz clock frequency, GAP9 runs the proverbial circles around its fellow French competitor. At least, it does on simulated MobileNetv1 benchmarks

GreenWaves says GAP9 can process 160×160-pixel images 14× faster than the ST parts, ripping through 83.9 frames/sec, versus 6.2 frames/sec, with the same 43% accuracy. Conversely, GAP9 can deliver the same frame rate as the ST part while running at just 29 MHz. Or, if what you want is maximum accuracy, GAP9 can do 192×192, 6 fps, with 70% accuracy. In all three cases, GreenWaves says its device consumes two-thirds less (simulated) power than the ST device, down to 97% less in the best case. 

We’re a few months away from knowing exactly how GAP9 will work, behave, and perform. But its predecessor, GAP8, is real and it seems to be doing the job for its customers, so the architecture is sound. GAP9 doesn’t require a leap of faith on a new family; it’s more of a straightforward upgrade. Maybe we’re seeing the start of an IoT dynasty. 

One thought on “GAP9 for ML at the Edge”

Leave a Reply

featured blogs
Sep 30, 2022
Wow, September has flown by. It's already the last Friday of the month, the last day of the month in fact, and so time for a monthly update. Kaufman Award The 2022 Kaufman Award honors Giovanni (Nanni) De Micheli of École Polytechnique Fédérale de Lausanne...
Sep 29, 2022
We explain how silicon photonics uses CMOS manufacturing to create photonic integrated circuits (PICs), solid state LiDAR sensors, integrated lasers, and more. The post What You Need to Know About Silicon Photonics appeared first on From Silicon To Software....
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Automotive Electronic Seat Control

Sponsored by Infineon

Today’s automotive seat design must keep in mind size, cost, battery life and passing EMC testing. In this episode of Chalk Talk, Amelia Dalton and Rick Browarski from Infineon investigate the newest innovations in automotive electronic seat control. They take a closer look at the anatomy of power seats today, the role that an ECU plays in the control of electronic seats and how Infineon chip set offerings can help you with your next smart power seat design.

Click here for more information abotu information about Infineon Seat Control