feature article
Subscribe Now

Get Woke: Sensory Improves Voice Activation

Software Company Fine-Tunes Wake-Word Recognition

“I know the voices aren’t real, but they have some good ideas.” – anonymous

We laugh when Siri, Alexa, or Cortana misunderstands us and makes embarrassing mistakes, but we cut them some slack because, hey, the technology is only a few years old, right?  

Nope. Speech recognition has been around for almost 100 years, and we’re only now getting to the point where it’s actually useful. You’d think we could make better progress than that.

Voice recognition is everywhere now. We talk to (not into) our phones; we talk to Alexa; we talk to our TV remote controls. The machines listen and, we hope, do our bidding. All those systems rely on two fundamental technologies: voice recognition, to figure out what you’re saying, and artificial intelligence, to figure out what to do about it.

The AI part may prove to be infinitely difficult, but voice recognition is no slam dunk, either. Santa Clara–based Sensory has been working on the latter problem for almost 25 years, and the company takes credit for introducing the idea to Apple.

Ten or 20 years ago, voice-recognition systems were painfully limited and inaccurate. You generally had to train them to recognize a few spoken commands by, for example, repeating the word “play” or “stop” until it collected enough samples of your voice to make a template. After that, the machine would respond only to you; anyone else’s voice wouldn’t match the template. Intentionally or not, those early systems were speaker-dependent.

Nowadays that’s considered a bug. Voice-activated gadgets are supposed to be speaker-independent. And, except for folks with a strong regional accent, they generally are. But now the industry is doing a 180: we’re trying to make devices speaker-dependent again, and Sensory is offering new software for exactly that purpose.

The idea is to allow your phone or smart appliance (think Amazon Echo or Nest thermostat) to distinguish between speakers so that it responds appropriately. You might want Alexa to follow your commands, for example, but not those of your four-year-old daughter. Even though Alexa understands both speakers, you would hope that it’s smart enough to ignore one but not the other. That’s tricky to do.

The more subtle benefits allow a device to respond to ambiguous or self-referential commands. “Update my calendar” is tricky if you don’t know who the speaker is. “Call work” can also be ambiguous. Today’s voice assistants get around this in a different way. Asking Siri about your wife’s birthday usually works, but only because (a) Siri knows who its owner is, and (b) you’ve probably identified your spouse’s birthday in your contacts list. Siri doesn’t really know who’s speaking, and any random stranger holding your phone would get the same response.  

Sensory’s software starts segregating speakers’ voices automatically, without any specific vocal training. What happens after that is up to you, however. Different OEMs will have different methods of assigning rights and privileges to each speaker; that’s a user-interface issue. What was once a bug is now a feature.

Another key subcomponent of any speech-recognition system is an effective “wake word.” Without a wake word, the system must be on, ready, listening, and processing audio 100% of the time. That obviously wastes processing power and energy, but worse, it makes it tough for the system to tell if you’re talking to it or if you’re just talking. Without calling out, “Alexa,” (an intentionally unusual multisyllabic name that’s unlikely to occur in normal conversation), your creepy Amazon eavesdropper won’t know that you’re ready to order diapers. Like an old dog, a wake word allows the system to sleep 99% of the time and pay attention only when it hears its name.

Sensory offers software for small DSPs and MCUs to implement wake words. Depending on your hardware and the number and complexity of the wake words, it requires as little as 1 MIPS of processing and 100KB of RAM. Better hardware allows for more elaborate setups. Power consumption – one of the reasons for having wake words at all – depends on your hardware, but Sensory says some OEMs are operating at under 1mA.

To make wake words even more power-efficient, Sensory has also developed its own hardware IP, the first in many years. Imaginatively called LPSD, for low-power sound detection, it’s an accelerator that gets integrated into a soft DSP from Ceva, Synopsys, Tensilica, or others. The idea is that your DSP can go to sleep and ignore audio input entirely. Sensory’s LPSD will trigger on the wake word and fire up the DSP in time for it to process the full audio command stream.

My guess is that Sensory has at least another 25 good years ahead of it. Voice activation and speech recognition aren’t going away, but there’s a lot of improvement yet to be done. The more we talk to our machines, the more we’re going to need the underlying black magic to make it happen.  

Leave a Reply

featured blogs
Sep 28, 2022
Learn how our acquisition of FishTail Design Automation unifies end-to-end timing constraints generation and verification during the chip design process. The post Synopsys Acquires FishTail Design Automation, Unifying Constraints Handling for Enhanced Chip Design Process app...
Sep 28, 2022
You might think that hearing aids are a bit of a sleepy backwater. Indeed, the only time I can remember coming across them in my job at Cadence was at a CadenceLIVE Europe presentation that I never blogged about, or if I did, it was such a passing reference that Google cannot...
Sep 22, 2022
On Monday 26 September 2022, Earth and Jupiter will be only 365 million miles apart, which is around half of their worst-case separation....

featured video

Embracing Photonics and Fiber Optics in Aerospace and Defense Applications

Sponsored by Synopsys

We sat down with Jigesh Patel, Technical Marketing Manager of Photonic Solutions at Synopsys, to learn the challenges and benefits of using photonics in Aerospace and Defense systems.

Read the Interview online

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Gate Driving Your Problems Away

Sponsored by Mouser Electronics and Infineon

Isolated gate drivers are a crucial design element that can protect our designs from over-voltage and short circuits. But how can we fine tune these isolated gate drivers to match the design requirements we need? In this episode of Chalk Talk, Amelia Dalton and Perry Rothenbaum from Infineon explore the programmable features included in the EiceDRIVER™ X3 single-channel highly flexible isolated gate drivers from Infineon. They also examine why their reliable and accurate protection, precise and fast on and off switching and DESAT protection can make them a great fit for your next design.

Click here for more information about Infineon Technologies EiceDRIVER™ Isolated & Non-Isolated Gate Drivers