feature article
Subscribe Now

GreenWaves Puts Another Spin on IoT Chips

Nine-core GAP8 Processor Brings Low Power to Machine Learning

“If reason were fashionable, you would all have reason.” – Fanny de Beauharnais

Five bucks and nine processor cores. That’s the takeaway from GreenWaves’ new GAP8 processor chip.

Oh, and its low power consumption, measured in nanoamps. And parallel-programming tools. And convolutional neural net (CNN) acceleration. And an Arduino-esque development board. That about covers it.

GreenWaves is a 16-person startup with headquarters outside scenic Grenoble, France. Their first (and so far, only) product is GAP8, a microcontroller intended to bring CNN smarts to cheap, battery-powered IoT edge devices. The chip combines some familiar open-source hardware with some unfamiliar, proprietary circuitry to create something the company thinks will be smarter and more power-efficient than anything the world’s many and varied Arm licensees can conjure up.

Rather than using the ubiquitous Arm Cortex-A or -M cores, GreenWaves relies on the potentially ubiquitous RISC-V design. The benefits here are twofold: RISC-V is free (as in free beer), and RISC-V permits user-defined extensions. GreenWaves took advantage of both characteristics to build itself a complex multicore MCU that’s tweaked for image, audio, and sensor processing. The idea is to make the edge-node processor smart enough that it doesn’t have to upload raw data to a smarter device upstream. Do your data-capture, analysis, filtering, and massaging right at the point of collection and you’ll save yourself time, money, and power.

GAP8 has nine identical RISC-V cores: one for overall housekeeping and eight for massaging incoming data. The housekeeping side looks like a very traditional MCU, with a UART, SPI and I2C interfaces, PWM, an RTC, some on-chip ROM and RAM, and all the usual accoutrements of a conventional MCU. Don’t look too closely, and that side and GAP8 could pass for almost any MCU made in the past decade.

It’s the other side of the chip that looks interesting. Here, the company has mated eight identical RISC-V cores with a hardware dispatch mechanism, some convolution-acceleration hardware, a single, shared instruction cache, and a single, shared array of data RAM. The idea here is that signal-processing tasks get assigned to one or more of the RISC-V cores, perhaps with a CNN assist where appropriate. The hardware dispatcher decides which lucky CPU gets the job and handles meting out chores as hardware becomes available.

Because there is no data cache, all eight CPUs read from, and write to, the same shared bank of SRAM. This arrangement allows each RISC-V core to see what the others are doing; no message passing or data copying required. There’s also a DMA in the cluster to handle the menial work of large-scale data transfers so the valuable CPUs don’t have to.

The overall architecture was guided by research done at two local universities under the auspices of their PULP (parallel ultra-low power) program. As such, GAP8 qualifies as a PULP instantiation.

GreenWaves’ goal with GAP8 wasn’t just to make a smart MCU, but also to make one that could run for months – maybe even years – on a battery or photovoltaic cell. So, the company got aggressive with its power management. First off, the “normal” half of the chip is almost entirely separate from the parallel-processing half. They have different voltage references and different clock domains, and one side can be powered-up or -down independently of the other. Then, each significant hardware resource (i.e., processor core, peripheral interface, data RAM, etc.) can be powered and/or clocked individually. This allows the chip to power only the bits required for a particular task.

Or, more precisely, it allows you to power only the bits required. Very little of the clock and frequency scaling is automatic, nor are tasks auto-magically assigned to the eight RISC-V cores without explicit instructions from software. GreenWaves does provide some graph translators and libraries for common algorithms (FIR, FFT, etc.) but the job of parallelizing your overall application is largely up to you. GreenWaves provides the hardware and a fully developed gnu toolset including OpenMP. But one company can do only so much.

GAP8’s adjustable clocking and voltage gating allow the chip to do just what’s required to get by, and no more, according to its designers. That allows power consumption to jump from mere nanoamps (when just the RTC is running) to a few tens of milliwatts, with lots of small steps in between. The jumps are quick (although GreenWaves hasn’t quantified “quick”), minimizing the area under the energy curve.

GreenWaves does provide at least a few data points comparing its GAP8 (in simulation) against an existing STMicroelectronics F7-class device (based on Arm’s Cortex-M7) running the CMSIS-NN library functions. When the GAP8 was tuned to execute in the same amount of time as the STM part, it consumed one-sixteenth as much power (3.7mW vs. 60mW). Conversely, if the GAP8’s performance was dialed up to match the STM part’s power budget, it ran the test in less than one-tenth the time. Either way, the GAP8 appears to be an order of magnitude more efficient, at least on that particular subset of CNN tasks.

GreenWaves taped out its GAP8 late last year, and it’s expecting to receive samples right about… now. Evaluation boards will follow working silicon. Eventually, the chip will be priced in the single digits, with large orders (100,000 units) priced at about $5. If the company can hit its targets, that’s a good deal for a device with so much on-the-spot capability.

featured blogs
Jul 12, 2024
I'm having olfactory flashbacks to the strangely satisfying scents found in machine shops. I love the smell of hot oil in the morning....

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

Sponsored by Cadence Design Systems

Larsen & Toubro built the world’s largest FIFA stadium in Qatar, the world’s tallest statue, and one of the world’s most sophisticated cricket stadiums. Their latest business venture? Designing data centers. Since IT equipment in data centers generates a lot of heat, it’s important to have an efficient and effective cooling system. Learn why, Larsen & Toubro use Cadence Reality DC Design Software for simulation and analysis of the cooling system.

Click here for more information about Cadence Multiphysics System Analysis

featured paper

Navigating design challenges: block/chip design-stage verification

Sponsored by Siemens Digital Industries Software

Explore the future of IC design with the Calibre Shift left initiative. In this paper, author David Abercrombie reveals how Siemens is changing the game for block/chip design-stage verification by moving Calibre verification and reliability analysis solutions further left in the design flow, including directly inside your P&R tool cockpit. Discover how you can reduce traditional long-loop verification iterations, saving time, improving accuracy, and dramatically boosting productivity.

Click here to read more

featured chalk talk

Digi XBee 3 Global Cellular Solutions
Sponsored by Mouser Electronics and Digi
Adding cellular capabilities to your next design can be a complicated, time consuming process. In this episode of Chalk Talk, Amelia Dalton and Alec Jahnke from Digi chat about how Digi XBee Global Cellular Solutions can help you navigate the complexities of adding cellular connectivity to your next design. They investigate how the Digi XBee software can help you monitor and manage your connected devices and how the Digi Xbee 3 cellular ecosystem can help future proof your next design.
Nov 6, 2023
30,865 views