editor's blog
Subscribe Now

Neural Networks are Finding a Place at the Adult’s Table

 

The deep learning revolution is the most interesting thing happening in the electronics industry today, said Chris Rowen during his keynote speech at the Electronic Design Process Symposium (EDPS), held last month at the Milpitas headquarters of SEMI, the industry association for the electronics supply chain. “The hype can hardly be understated,” continued Rowen. Search “deep learning” on Google and you’ll already get more than three billion hits. (Well, I got 20M for “deep learning” and 451M for “artificial intelligence,” but still, that’s a lot.) “There are 12,000 startups worldwide listed in Crunchbase,” he added. (I got 1497, again for “deep learing,” but still…) According to Rowen, 16,500 papers on deep learning and AI were published on arxiv.org in the past 12 months.

In other words, AI is hot (in case you’ve been living in a cave or an underground bomb shelter for the past few years).

Rowen is CEO of BabbleLabs, formerly BabbLabs, but the missing “e” turned out to confuse people who found they couldn’t pronounce it. BabbleLabs is a deep-learning startup. It’s devoted to applying deep learning and DNNs (deep neural networks) to speech processing.

Deep learning is a “mathematical layer cake model for learning,” explained Rowen. (I suspect he was referring to the various layers, hidden and otherwise, in the DNN model.) You take a large number of inputs and put them through a hidden system to get a desired output after a period of training. This model is very general and works for almost any kind of data, but you must have a way of gathering all of the required training data.

Currently, the biggest application for DNNs is, by far, vision systems. Training for these systems is enormously complex and running these systems consumes a lot of compute cycles. DNN-based vision systems gobble up TOPS (tera operations per second) like kids snack on candy corn during Halloween.

The fundamental question, said Rowen, is “Where do the smarts go?” In other words, where’s the best place to execute all of those tera-ops for vision systems? Is the best place close to the camera? That will give you low latency and will not overburden the network with traffic, but will degrade the ability to aggregate data from multiple cameras.

Is the best place to execute all of the tera-ops in some sort of aggregation location? At the cloud edge? In the cloud?

There’s no single answer. (That would be too easy, wouldn’t it?)

There are many critical tradeoffs to consider:

If you want to maximize system responsiveness, you make the processing local. That’s sort of obvious. You don’t want an autonomous car’s collision-avoidance DNN to be located in the cloud where a network dropout could cause a multi-car pileup; you want the processing in the car.

If you need global analysis of data from multiple cameras, such as in a surveillance system, then you want the processing in the cloud.

If you’re concerned about privacy, you don’t want raw video traversing the network. You want the processing to be local.

If you want to minimize cost, you’ll need to constrain the DNN and keep the processing local. Cloud computing is very flexible but it’s a pay-as-you-go system and the operating costs increase monotonically.

At this point, Rowen segued to the work of BabbleLabs. “Voice is vision,” he declared. “It’s the most human interface because there are five billion users (including those people listening to radio).

But there’s another aspect to AI-enhanced voice processing and recognition that indeed makes it a lot like video. “Voice recognition is essentially image recognition performed on spectrograms,” said Rowen.

Now there’s an intriguing idea.

Look at a spectrogram that plots frequency over time. It’s a 2D image, and just like any image, you can train a DNN to recognize traits buried in the spectrogram. Rowen demonstrated a BabbleLabs speech enhancer, which uses AI enhancements to strip road and wind noise from words spoken alongside a busy street in Montevideo, Uruguay. It works surprisingly well.

See for yourself (and watch to the end before making a hasty judgement):

 

The training wheels are coming off.

 

Leave a Reply

featured blogs
Mar 20, 2019
In this week's Whiteboard Wednesdays video, Rong Chen talks about the 5G use cases and PHY processing and highlights the B20 DSP features which enable efficient control and DSP processing for 5G... [[ Click on the title to access the full blog on the Cadence Community s...
Mar 19, 2019
Deeper Dive into Non-Trivial Bug Escapes and Safety Critical Designs This blog is a continuation of a series of blogs related to the 2018 Wilson Research Group Functional Verification Study (click here).  In my previous blog (click here), I presented verification results fin...
Mar 19, 2019
As one of the great superheroes learned '€“ '€œwith great power comes great responsibility.'€ Even without '€œspidey-sense'€ and the ability to scale tall buildings with webbing, there is still power to be wielded with Samtec'€™s new mPOWER'„¢ connectors'€¦w...
Jan 25, 2019
Let'€™s face it: We'€™re addicted to SRAM. It'€™s big, it'€™s power-hungry, but it'€™s fast. And no matter how much we complain about it, we still use it. Because we don'€™t have anything better in the mainstream yet. We'€™ve looked at attempts to improve conven...