feature article
Subscribe Now

Get Woke: Sensory Improves Voice Activation

Software Company Fine-Tunes Wake-Word Recognition

“I know the voices aren’t real, but they have some good ideas.” – anonymous

We laugh when Siri, Alexa, or Cortana misunderstands us and makes embarrassing mistakes, but we cut them some slack because, hey, the technology is only a few years old, right?  

Nope. Speech recognition has been around for almost 100 years, and we’re only now getting to the point where it’s actually useful. You’d think we could make better progress than that.

Voice recognition is everywhere now. We talk to (not into) our phones; we talk to Alexa; we talk to our TV remote controls. The machines listen and, we hope, do our bidding. All those systems rely on two fundamental technologies: voice recognition, to figure out what you’re saying, and artificial intelligence, to figure out what to do about it.

The AI part may prove to be infinitely difficult, but voice recognition is no slam dunk, either. Santa Clara–based Sensory has been working on the latter problem for almost 25 years, and the company takes credit for introducing the idea to Apple.

Ten or 20 years ago, voice-recognition systems were painfully limited and inaccurate. You generally had to train them to recognize a few spoken commands by, for example, repeating the word “play” or “stop” until it collected enough samples of your voice to make a template. After that, the machine would respond only to you; anyone else’s voice wouldn’t match the template. Intentionally or not, those early systems were speaker-dependent.

Nowadays that’s considered a bug. Voice-activated gadgets are supposed to be speaker-independent. And, except for folks with a strong regional accent, they generally are. But now the industry is doing a 180: we’re trying to make devices speaker-dependent again, and Sensory is offering new software for exactly that purpose.

The idea is to allow your phone or smart appliance (think Amazon Echo or Nest thermostat) to distinguish between speakers so that it responds appropriately. You might want Alexa to follow your commands, for example, but not those of your four-year-old daughter. Even though Alexa understands both speakers, you would hope that it’s smart enough to ignore one but not the other. That’s tricky to do.

The more subtle benefits allow a device to respond to ambiguous or self-referential commands. “Update my calendar” is tricky if you don’t know who the speaker is. “Call work” can also be ambiguous. Today’s voice assistants get around this in a different way. Asking Siri about your wife’s birthday usually works, but only because (a) Siri knows who its owner is, and (b) you’ve probably identified your spouse’s birthday in your contacts list. Siri doesn’t really know who’s speaking, and any random stranger holding your phone would get the same response.  

Sensory’s software starts segregating speakers’ voices automatically, without any specific vocal training. What happens after that is up to you, however. Different OEMs will have different methods of assigning rights and privileges to each speaker; that’s a user-interface issue. What was once a bug is now a feature.

Another key subcomponent of any speech-recognition system is an effective “wake word.” Without a wake word, the system must be on, ready, listening, and processing audio 100% of the time. That obviously wastes processing power and energy, but worse, it makes it tough for the system to tell if you’re talking to it or if you’re just talking. Without calling out, “Alexa,” (an intentionally unusual multisyllabic name that’s unlikely to occur in normal conversation), your creepy Amazon eavesdropper won’t know that you’re ready to order diapers. Like an old dog, a wake word allows the system to sleep 99% of the time and pay attention only when it hears its name.

Sensory offers software for small DSPs and MCUs to implement wake words. Depending on your hardware and the number and complexity of the wake words, it requires as little as 1 MIPS of processing and 100KB of RAM. Better hardware allows for more elaborate setups. Power consumption – one of the reasons for having wake words at all – depends on your hardware, but Sensory says some OEMs are operating at under 1mA.

To make wake words even more power-efficient, Sensory has also developed its own hardware IP, the first in many years. Imaginatively called LPSD, for low-power sound detection, it’s an accelerator that gets integrated into a soft DSP from Ceva, Synopsys, Tensilica, or others. The idea is that your DSP can go to sleep and ignore audio input entirely. Sensory’s LPSD will trigger on the wake word and fire up the DSP in time for it to process the full audio command stream.

My guess is that Sensory has at least another 25 good years ahead of it. Voice activation and speech recognition aren’t going away, but there’s a lot of improvement yet to be done. The more we talk to our machines, the more we’re going to need the underlying black magic to make it happen.  

Leave a Reply

featured blogs
Jul 2, 2020
Using the bitwise operators in general, and employing them to perform masking operations in particular, can be extremely efficacious....
Jul 2, 2020
In June, we continued to upgrade several key pieces of content across the website, including more interactive product explorers on several pages and a homepage refresh. We also made a significant update to our product pages which allows logged-in users to see customer-specifi...
Jun 26, 2020
[From the last episode: We looked at the common machine-vision application and its primary .] We'€™ve seen that vision is a common AI these days, and we'€™ve also talked about the fact that our current spate of neural networks are not neuromorphic '€“ that is, they'€™...

Featured Video

Product Update: Advances in DesignWare Die-to-Die PHY IP

Sponsored by Synopsys

Hear the latest about Synopsys' DesignWare Die-to-Die PHY IP for SerDes-based 112G USR/XSR and parallel-based HBI interfaces. The IP, available in advanced FinFET processes, addresses the power, bandwidth, and latency requirements of high-performance computing SoCs targeting hyperscale data center, AI, and networking applications.

Click here for more information about DesignWare Die-to-Die PHY IP Solutions

Featured Paper

Cryptography: How It Helps in Our Digital World

Sponsored by Maxim Integrated

Gain a basic understanding of how cryptography works and how cryptography can help you protect your designs from security threats.

Click here to download the whitepaper

Featured Chalk Talk

Automotive MOSFET for the Transportation Market

Sponsored by Mouser Electronics and Infineon

MOSFETS are critical in automotive applications, where long-term reliability is paramount. But, do we really understand the failure rates and mechanisms in the devices we design in? In this episode of Chalk Talk, Amelia Dalton sits down with Jeff Darrow of Infineon to discuss the role of MOSFETS in transportation, solder inspection, qualification.

Click here for more information about Infineon Technologies OptiMOS™ 5 Power MOSFETs