“If reason were fashionable, you would all have reason.” – Fanny de Beauharnais
Five bucks and nine processor cores. That’s the takeaway from GreenWaves’ new GAP8 processor chip.
Oh, and its low power consumption, measured in nanoamps. And parallel-programming tools. And convolutional neural net (CNN) acceleration. And an Arduino-esque development board. That about covers it.
GreenWaves is a 16-person startup with headquarters outside scenic Grenoble, France. Their first (and so far, only) product is GAP8, a microcontroller intended to bring CNN smarts to cheap, battery-powered IoT edge devices. The chip combines some familiar open-source hardware with some unfamiliar, proprietary circuitry to create something the company thinks will be smarter and more power-efficient than anything the world’s many and varied Arm licensees can conjure up.
Rather than using the ubiquitous Arm Cortex-A or -M cores, GreenWaves relies on the potentially ubiquitous RISC-V design. The benefits here are twofold: RISC-V is free (as in free beer), and RISC-V permits user-defined extensions. GreenWaves took advantage of both characteristics to build itself a complex multicore MCU that’s tweaked for image, audio, and sensor processing. The idea is to make the edge-node processor smart enough that it doesn’t have to upload raw data to a smarter device upstream. Do your data-capture, analysis, filtering, and massaging right at the point of collection and you’ll save yourself time, money, and power.
GAP8 has nine identical RISC-V cores: one for overall housekeeping and eight for massaging incoming data. The housekeeping side looks like a very traditional MCU, with a UART, SPI and I2C interfaces, PWM, an RTC, some on-chip ROM and RAM, and all the usual accoutrements of a conventional MCU. Don’t look too closely, and that side and GAP8 could pass for almost any MCU made in the past decade.
It’s the other side of the chip that looks interesting. Here, the company has mated eight identical RISC-V cores with a hardware dispatch mechanism, some convolution-acceleration hardware, a single, shared instruction cache, and a single, shared array of data RAM. The idea here is that signal-processing tasks get assigned to one or more of the RISC-V cores, perhaps with a CNN assist where appropriate. The hardware dispatcher decides which lucky CPU gets the job and handles meting out chores as hardware becomes available.
Because there is no data cache, all eight CPUs read from, and write to, the same shared bank of SRAM. This arrangement allows each RISC-V core to see what the others are doing; no message passing or data copying required. There’s also a DMA in the cluster to handle the menial work of large-scale data transfers so the valuable CPUs don’t have to.
The overall architecture was guided by research done at two local universities under the auspices of their PULP (parallel ultra-low power) program. As such, GAP8 qualifies as a PULP instantiation.
GreenWaves’ goal with GAP8 wasn’t just to make a smart MCU, but also to make one that could run for months – maybe even years – on a battery or photovoltaic cell. So, the company got aggressive with its power management. First off, the “normal” half of the chip is almost entirely separate from the parallel-processing half. They have different voltage references and different clock domains, and one side can be powered-up or -down independently of the other. Then, each significant hardware resource (i.e., processor core, peripheral interface, data RAM, etc.) can be powered and/or clocked individually. This allows the chip to power only the bits required for a particular task.
Or, more precisely, it allows you to power only the bits required. Very little of the clock and frequency scaling is automatic, nor are tasks auto-magically assigned to the eight RISC-V cores without explicit instructions from software. GreenWaves does provide some graph translators and libraries for common algorithms (FIR, FFT, etc.) but the job of parallelizing your overall application is largely up to you. GreenWaves provides the hardware and a fully developed gnu toolset including OpenMP. But one company can do only so much.
GAP8’s adjustable clocking and voltage gating allow the chip to do just what’s required to get by, and no more, according to its designers. That allows power consumption to jump from mere nanoamps (when just the RTC is running) to a few tens of milliwatts, with lots of small steps in between. The jumps are quick (although GreenWaves hasn’t quantified “quick”), minimizing the area under the energy curve.
GreenWaves does provide at least a few data points comparing its GAP8 (in simulation) against an existing STMicroelectronics F7-class device (based on Arm’s Cortex-M7) running the CMSIS-NN library functions. When the GAP8 was tuned to execute in the same amount of time as the STM part, it consumed one-sixteenth as much power (3.7mW vs. 60mW). Conversely, if the GAP8’s performance was dialed up to match the STM part’s power budget, it ran the test in less than one-tenth the time. Either way, the GAP8 appears to be an order of magnitude more efficient, at least on that particular subset of CNN tasks.
GreenWaves taped out its GAP8 late last year, and it’s expecting to receive samples right about… now. Evaluation boards will follow working silicon. Eventually, the chip will be priced in the single digits, with large orders (100,000 units) priced at about $5. If the company can hit its targets, that’s a good deal for a device with so much on-the-spot capability.