Aaware of Wake Words

Few tech areas are hotter right now than smart speakers and smart home assistants, with Amazon Alexa, Google Assistant, Microsoft Cortana, and Apple Siri all tripping over each other to earn our trust as personal DJ, home shopper, lighting tech, news reporter, and research assistant. Devices ranging from the $50 Echo Dot and Google Home Mini up to $350-$400 Apple Home Pod and Google Home Max are coming into our living rooms, plugging in, connecting up, and… listening.

They may be listening to you now, in fact. Listening with arrays of finely-tuned microphones. Listening and processing and parsing and judging. Was that a wake word? Did you call? Can I help you with something? The general population seems all too eager to bring listening devices created by some of the largest corporations in the world into their homes and give them free rein to do pretty much whatever they like with the information they gather.

It’s a tad ironic that some of the same folks who walk around in tinfoil hats just in case the government is secretly trying to monitor their brain waves from space have no compunction at all about paying the likes of Google and Amazon to put microphones in their homes or posting zettabytes of personal information on Facebook. Apparently, for-profit businesses are a lot less scary than our elected officials and sworn public servants.

What could possibly go wrong?

Nevertheless, smart speaker sales more than tripled during 2017, and they show no sign of slowing down for the time being. For a startup, entering this market against these multi-billion dollar behemoths would appear to be an exercise in futility. But when technology and culture crash into a discontinuity this large, enormous opportunities can spin off like eddy currents from the rapids. In this case, the confluence of technologies and trends is a bit mind-boggling, and a number of startups are jumping in to surf the wave.

From a buzzword saturation point of view, personal digital assistants embody IoT, artificial intelligence, big data, DSP, cloud computing, high-speed networking, compute acceleration, low power design, digital signal processing, and a host of other “hot tech topics” rolled into one giant burrito. Serve that with a thick sauce of security and privacy concerns and you’ve got yourself a pretty volatile cocktail.

With voice control, the rubber meets the road at the microphone. Pulling in the crazy gamut of sound waves bouncing around the typical household and accurately divining the quietly uttered “Alexa” from the din is a formidable engineering feat. Last month, we got an impressive demonstration of a new technology from silicon valley startup Aaware, whose “Acoustically Aware” technology uses an array of MEMS digital microphones connected to a Xilinx Zynq SoC to implement proprietary far field voice interfaces. Aaware says their algorithms use advanced DSP techniques including adaptive beamforming to provide always-on far field voice isolation and recognition, with a very high degree of selectivity and immunity from background noise, while minimizing distortion to the original voice.

Aaware takes advantage of the extreme accelerated compute performance and power efficiency available in the Zynq devices to perform noise, echo, and reverb cancellation – the company claims up to -25db signal-to-interference ratio (SIR). It also performs source separation, discerning where sounds come from enabling downstream applications to steer things like cameras. Additionally, if multiple people are talking, source separation is critical for downstream automatic speech recognition (ASR) and natural language processing (NLP) to be successful. Aaware algorithms are flexible with respect to the number of microphones and the configuration of the microphone array, so a wide variety of applications are possible.

Aaware has packaged their technology into a development platform (sold through Avnet) that includes an array of 13 MEMS microphones and a choice of two different Zynq boards. Currently, they are using the Xilinx Zynq® 7010, which packs a peak DSP performance of 100 GMACs, delivered by FPGA fabric connected to dual ARM Cortex-A9 cores, up to 1 GB of DDR, and WiFi/BT, Gigabit Ethernet, or USB for connectivity. These kits also come with a standard Ubuntu Linux software environment and a standard ALSA-based audio interface. You should be able to unbox the kit and be at the “Hello World” (or “Okay Google”) stage on day one.

This is one of the first examples we’ve seen in practice of a trend we expect to become common – using an FPGA or (more often, perhaps) an FPGA SoC such as Zynq to produce what is essentially a third-party ASSP. While Xilinx and Intel/Altera struggle to make these awesomely powerful devices more developer friendly, early adopters such as Aaware are developing broadly-applicable technologies based on these chips, giving their customers all the benefits of a typical ASIC or ASSP implementation, but without the enormous risk associated with custom silicon development. This new breed of company takes the “fabless” concept one step further, acting as a kind of “fabless/chipless” semiconductor supplier.

One question that lingers over this approach, however, is the behavior of the FPGA companies themselves. Xilinx, in particular, has a longstanding reputation for trashing their own ecosystem by developing and offering IP, tools, and even end products that compete directly with their partners’ offerings. This has led to many startups keeping a wary eye on the FPGA suppliers’ level of involvement in their ventures. It will be interesting to see if this behavior changes over time as the broad applicability of this new category of FPGA SoC devices is far too vast for the FPGA companies to go it alone and truly take advantage of the available opportunities.

In a “canary in the mine” for Aaware’s technology and for this type of offering, a startup called Mycroft has integrated Aaware into a full-blown open-source, hackable smart speaker – the Mycroft Mark II, and launched the product on Kickstarter. In addition to taking advantage of Aaware’s high-performance voice isolation and recognition capability, Mycroft is banking on consumers resonating with the decidedly non-corporate approach, touting the fact that no audio data will be captured, uploaded, or used for nefarious purposes such as ad targeting by the big-data overlords.

Judging from Mycroft’s Kickstarter performance (they blew through their goal within hours and raced to 3x within days), there is a lot of interest out there in embracing the convenience of smart voice assistants while keeping a wary eye on big brother. Judging from Aaware’s starring role in the Mark II and their quick entry into Avnet’s line card, there is also a lot of interest out there in developing voice-based applications that require some pretty tricky audio processing as the cost of admission. It will be exciting to watch.