feature article
Subscribe Now

Get Woke: Sensory Improves Voice Activation

Software Company Fine-Tunes Wake-Word Recognition

“I know the voices aren’t real, but they have some good ideas.” – anonymous

We laugh when Siri, Alexa, or Cortana misunderstands us and makes embarrassing mistakes, but we cut them some slack because, hey, the technology is only a few years old, right?  

Nope. Speech recognition has been around for almost 100 years, and we’re only now getting to the point where it’s actually useful. You’d think we could make better progress than that.

Voice recognition is everywhere now. We talk to (not into) our phones; we talk to Alexa; we talk to our TV remote controls. The machines listen and, we hope, do our bidding. All those systems rely on two fundamental technologies: voice recognition, to figure out what you’re saying, and artificial intelligence, to figure out what to do about it.

The AI part may prove to be infinitely difficult, but voice recognition is no slam dunk, either. Santa Clara–based Sensory has been working on the latter problem for almost 25 years, and the company takes credit for introducing the idea to Apple.

Ten or 20 years ago, voice-recognition systems were painfully limited and inaccurate. You generally had to train them to recognize a few spoken commands by, for example, repeating the word “play” or “stop” until it collected enough samples of your voice to make a template. After that, the machine would respond only to you; anyone else’s voice wouldn’t match the template. Intentionally or not, those early systems were speaker-dependent.

Nowadays that’s considered a bug. Voice-activated gadgets are supposed to be speaker-independent. And, except for folks with a strong regional accent, they generally are. But now the industry is doing a 180: we’re trying to make devices speaker-dependent again, and Sensory is offering new software for exactly that purpose.

The idea is to allow your phone or smart appliance (think Amazon Echo or Nest thermostat) to distinguish between speakers so that it responds appropriately. You might want Alexa to follow your commands, for example, but not those of your four-year-old daughter. Even though Alexa understands both speakers, you would hope that it’s smart enough to ignore one but not the other. That’s tricky to do.

The more subtle benefits allow a device to respond to ambiguous or self-referential commands. “Update my calendar” is tricky if you don’t know who the speaker is. “Call work” can also be ambiguous. Today’s voice assistants get around this in a different way. Asking Siri about your wife’s birthday usually works, but only because (a) Siri knows who its owner is, and (b) you’ve probably identified your spouse’s birthday in your contacts list. Siri doesn’t really know who’s speaking, and any random stranger holding your phone would get the same response.  

Sensory’s software starts segregating speakers’ voices automatically, without any specific vocal training. What happens after that is up to you, however. Different OEMs will have different methods of assigning rights and privileges to each speaker; that’s a user-interface issue. What was once a bug is now a feature.

Another key subcomponent of any speech-recognition system is an effective “wake word.” Without a wake word, the system must be on, ready, listening, and processing audio 100% of the time. That obviously wastes processing power and energy, but worse, it makes it tough for the system to tell if you’re talking to it or if you’re just talking. Without calling out, “Alexa,” (an intentionally unusual multisyllabic name that’s unlikely to occur in normal conversation), your creepy Amazon eavesdropper won’t know that you’re ready to order diapers. Like an old dog, a wake word allows the system to sleep 99% of the time and pay attention only when it hears its name.

Sensory offers software for small DSPs and MCUs to implement wake words. Depending on your hardware and the number and complexity of the wake words, it requires as little as 1 MIPS of processing and 100KB of RAM. Better hardware allows for more elaborate setups. Power consumption – one of the reasons for having wake words at all – depends on your hardware, but Sensory says some OEMs are operating at under 1mA.

To make wake words even more power-efficient, Sensory has also developed its own hardware IP, the first in many years. Imaginatively called LPSD, for low-power sound detection, it’s an accelerator that gets integrated into a soft DSP from Ceva, Synopsys, Tensilica, or others. The idea is that your DSP can go to sleep and ignore audio input entirely. Sensory’s LPSD will trigger on the wake word and fire up the DSP in time for it to process the full audio command stream.

My guess is that Sensory has at least another 25 good years ahead of it. Voice activation and speech recognition aren’t going away, but there’s a lot of improvement yet to be done. The more we talk to our machines, the more we’re going to need the underlying black magic to make it happen.  

Leave a Reply

featured blogs
May 24, 2022
By Melika Roshandell Today's modern electronic designs require ever more functionality and performance to meet consumer demand. These requirements make scaling traditional, flat, 2D-ICs very challenging. With the recent introduction of 3D-ICs into the electronic design indust...
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

High Voltage Charging Solution for Energy Storage & Backup Systems

Sponsored by Mouser Electronics and Analog Devices

Today there is growing demand for energy storage with more power, longer range, and longer run time. But the question remains: how can we increase our energy storage given the energy storage mediums on the market today? In this episode of Chalk Talk, Amelia Dalton chats with Anthony Huyhn from Analog Devices about the benefits of high voltage energy storage, why stacked battery cells are crucial to these kinds of systems, how high voltage energy storage systems can reduce conduction loss exponentially and what kind of high voltage charging solutions from Analog Devices are on the market today.

Click here for more information about the Maxim Integrated MAX17703 Li-Ion Battery Charger Controller