editor's blog
Subscribe Now

Neural Networks are Finding a Place at the Adult’s Table

 

The deep learning revolution is the most interesting thing happening in the electronics industry today, said Chris Rowen during his keynote speech at the Electronic Design Process Symposium (EDPS), held last month at the Milpitas headquarters of SEMI, the industry association for the electronics supply chain. “The hype can hardly be understated,” continued Rowen. Search “deep learning” on Google and you’ll already get more than three billion hits. (Well, I got 20M for “deep learning” and 451M for “artificial intelligence,” but still, that’s a lot.) “There are 12,000 startups worldwide listed in Crunchbase,” he added. (I got 1497, again for “deep learing,” but still…) According to Rowen, 16,500 papers on deep learning and AI were published on arxiv.org in the past 12 months.

In other words, AI is hot (in case you’ve been living in a cave or an underground bomb shelter for the past few years).

Rowen is CEO of BabbleLabs, formerly BabbLabs, but the missing “e” turned out to confuse people who found they couldn’t pronounce it. BabbleLabs is a deep-learning startup. It’s devoted to applying deep learning and DNNs (deep neural networks) to speech processing.

Deep learning is a “mathematical layer cake model for learning,” explained Rowen. (I suspect he was referring to the various layers, hidden and otherwise, in the DNN model.) You take a large number of inputs and put them through a hidden system to get a desired output after a period of training. This model is very general and works for almost any kind of data, but you must have a way of gathering all of the required training data.

Currently, the biggest application for DNNs is, by far, vision systems. Training for these systems is enormously complex and running these systems consumes a lot of compute cycles. DNN-based vision systems gobble up TOPS (tera operations per second) like kids snack on candy corn during Halloween.

The fundamental question, said Rowen, is “Where do the smarts go?” In other words, where’s the best place to execute all of those tera-ops for vision systems? Is the best place close to the camera? That will give you low latency and will not overburden the network with traffic, but will degrade the ability to aggregate data from multiple cameras.

Is the best place to execute all of the tera-ops in some sort of aggregation location? At the cloud edge? In the cloud?

There’s no single answer. (That would be too easy, wouldn’t it?)

There are many critical tradeoffs to consider:

If you want to maximize system responsiveness, you make the processing local. That’s sort of obvious. You don’t want an autonomous car’s collision-avoidance DNN to be located in the cloud where a network dropout could cause a multi-car pileup; you want the processing in the car.

If you need global analysis of data from multiple cameras, such as in a surveillance system, then you want the processing in the cloud.

If you’re concerned about privacy, you don’t want raw video traversing the network. You want the processing to be local.

If you want to minimize cost, you’ll need to constrain the DNN and keep the processing local. Cloud computing is very flexible but it’s a pay-as-you-go system and the operating costs increase monotonically.

At this point, Rowen segued to the work of BabbleLabs. “Voice is vision,” he declared. “It’s the most human interface because there are five billion users (including those people listening to radio).

But there’s another aspect to AI-enhanced voice processing and recognition that indeed makes it a lot like video. “Voice recognition is essentially image recognition performed on spectrograms,” said Rowen.

Now there’s an intriguing idea.

Look at a spectrogram that plots frequency over time. It’s a 2D image, and just like any image, you can train a DNN to recognize traits buried in the spectrogram. Rowen demonstrated a BabbleLabs speech enhancer, which uses AI enhancements to strip road and wind noise from words spoken alongside a busy street in Montevideo, Uruguay. It works surprisingly well.

See for yourself (and watch to the end before making a hasty judgement):

 

The training wheels are coming off.

 

Leave a Reply

featured blogs
May 22, 2020
As small as a postage stamp, the Seeeduino XIAO boasts a 32-bit Arm Cortex-M0+ processor running at 48 MHz with 256 KB of flash memory and 32 KB of SRAM....
May 22, 2020
Movies have the BAFTAs and Academy Awards. Music has the GRAMMYs. Broadway has the Tonys. Global footballers have the Ballon d’Or. SI/PI engineers have the DesignCon 2020 Best Paper Award? Really? What’s that? Many people are familiar with annual awards mentioned....
May 22, 2020
[From the last episode: We looked at the complexities of cache in a multicore processor.] OK, time for a breather and for some review. We'€™ve taken quite the tour of computing, both in an IoT device (or even a laptop) and in the cloud. Here are some basic things we looked ...
May 21, 2020
In this time of financial uncertainty, a yield-shield portfolio can protect your investments from market volatility.  Uncertainty can be defined as a quantitative measurement of variability in the data [1]. The more the variability in the data – whet...

Featured Video

Featured Paper

How LiDAR Delivers Critical Distance-Sensing for Self-Driving Cars

Sponsored by Maxim Integrated

Commercialization of autonomous cars represents an exciting journey ahead, and LiDAR technology in ADAS is right in line to become a significant player in the future of autonomous vehicles. Its performance depends on the optical front-end, as well as how the signal is transmitted through the signal chain and then processed. An important component in this signal chain is the transimpedance amplifier (TIA). Read more for an overview of how LiDAR works and what to look for in an effective TIA.