Audio Processing Evolves Like Chip Design

There seems to be a natural progression in our industry. It happened before, more slowly, with basic chip design; now that the concept is well established, it can happen more easily elsewhere.

We’re talking about ways to speed up the design process when designing… well, pretty much anything. But the process arose out of the semiconductor industry as we tried to build bigger and better chips more quickly.

Three-Step Program

The first step is tools. Simple tools at first, gradually moving to sophisticated. Tedious steps get replaced by automation. Agonizing low-level detail gets replaced by abstraction. And, release by release, designers can get lots more work done in less time.

But that’s not where it stops. There are common circuits needed for many chips, and, yes, the tools would help you to design those circuits more quickly. But heck, if they’re not your secret sauce, and, if you’re spending a lot of time on it, then there has to be a better solution.

A great example is I/O. And it reminds me of a programming assignment I had in college (using Pascal, no less!). The assignment was on recursion. We were to hard-code a tree and then search the tree using recursion. I wrote it all up and ran it. And… it worked. First time. That never happens! (And hasn’t happened since.)

So there I was with all this extra time on my hands. So I thought, “I know! I’ll get some extra credit by writing the code that will allow someone to enter an arbitrary tree and then search it!” And I did. And it took forever. At least one all-nighter, if not more. And when I was done, yeah, it worked, but roughly 10% or so of the resulting code actually did the assignment; the remaining 90% was all the I/O. And the time it took to write that was even more out of balance.

Oh, and I got 3 extra-credit points added to 100. Woohoo! Someone alert the media!

So I spent an extraordinary amount of time doing the I/O – none of which was part of the secret sauce.

A silicon example of this might be PCI Express. Yeah, you can roll your own. Good luck with the monster-sized spec that may take six months simply to understand.

Of course, that’s not how we do it anymore. Now we have IP. The tools certainly help to design the IP, but, while you’ll use the tools for your proprietary added value, you’ll handle that pesky PCIe by plugging in an IP block (with it’s enormous number of parameters to set) and be done a whole lot faster.

So the first progression is from tools to IP.

Then you may find that you’re trying to take an entire design, much of which would be common to any implementation, and customize it to differentiate your product. Or heck, maybe it’s a side chip that’s necessary to support some other chip. Do you really want to dig in and do the whole thing, like everyone else is doing? No, you want someone to give you an example of an already-done version that you can use (perhaps with some tweaks). And lo, we have the reference design. Which is still pretty much IP, except that it’s usually given away as a way to sell more of something on it.

So we go from tools to IP to reference designs.

Audio Evolution

So let’s take this model and apply it to a different space: audio processing. We looked at a company called DSP Concepts several years ago. They made (and still make) a tool called Audio Weaver. It let sound engineers (not the recording-studio type) process raw audio signals to improve them for higher-level processing. (Those higher-level functions were done by other companies.)

Well, they’ve come back, and they’ve augmented their strategy. Yes, they still have the tool. But now much of their focus is on selling… wait for it… IP. Blocks that designers can plug in and go with. Or customize.

They still don’t do high-level functions like speech recognition; others do that. But they have signal processing blocks dedicated to supporting a variety of higher-level functions that they’ve organized as being either input (e.g., microphone) or output (e.g., speaker).

Input
- Quiescent sound detection. This is the first step for voice-triggered equipment: deciding whether or not a vocal sound happened.
- Attention processing. As in, detecting a “wake word” for voice-triggered equipment. This is the next step after deciding that someone is talking. They don’t do the actual wake-word engines, but they try to improve the performance of those engines.
- Noise reduction. Kinda self-evident, for a wide variety of different kinds of noise.
- Optimized beamformers. With multiple microphones, it means adjusting the timing and other parameters of the individual microphone streams so that you can focus in on whoever is speaking regardless of direction or distance from the mics. Very common, given that multiple mics is a thing.
Output
- Volume management. You know all those different sources you route through the same speaker? And you’re constantly adjusting the volume because each stream is different? Yeah, this is about fixing that.
- Bass enhancement (the audio, not the fish… but also not the phish… although I guess it could be…)
- Dialog enhancement

So here we have the first two steps in the canonical progression: tools to IP. Without the IP replacing the tools outright.

But wait, there’s more. They’ve also worked with Amlogic, who has an SoC – the A113 – with four A34 processors for a voice user-interface reference design targeting smart speakers – a hot area. They have variants for 2, 4, 6, and 7 microphones, and they support playback processing – the output functions listed above. And Amazon has already qualified it.

So if you’re working on a smart-speaker design, much of the task is already done for you. And if you’re working on one for Amazon, you save even more time by being pre-qualified. DSP Concepts is also targeting other silicon platforms for this.

To be clear, these functions would normally be implemented in a DSP attached to the processor. In this case, you can skip the DSP and do it all in the processor chip.

And they’ve also got an automotive audio reference design. It tries to be the one-stop shop for all your center-stack audio needs, including:

Microphone processing, for hands-free calls, a voice command interface, noise reduction, beamforming, and acoustic echo cancellation.
Playback processing
Advanced cabin acoustics tuning
Engine sound enhancement
Pedestrian warnings

And so we complete the evolution: tools to IP to reference designs. All in a few years (as contrasted with the much longer timeframe for the equivalent generic silicon design version).

More info:

Amlogic A113

DSP Concepts