feature article
Subscribe Now

Intel Achieves AI Nervana

All Apologies to GPUs and FPGAs

At CES 2019 in Las Vegas this week, Navin Shenoy – Intel Data Center Group executive vice president, announced the Intel Nervana Neural Network Processor for Inference, which will go into production this year. Back in 2016, Intel acquired Nervana, a 48-person AI SAAS startup from San Diego, for (reportedly) something like $408 million. Nervana was a software company at the time, providing a full-stack software-as-a-service platform called Nervana Cloud, based on an open-source framework called Neon (that rivaled Caffe, Tensorflow, and others), enabling the development of custom deep learning applications.

Nervana was also reportedly working on the development of a custom chip for neural network processing at the time, which they claimed would outperform GPUs as AI accelerators by a factor of at least ten. Of course, developing a custom processor is a tall order for a small software team, but that ambition was made dramatically more realistic with their acquisition by Intel. Now, Intel is announcing the delivery of the first part of that vision – the Intel Nervana Neural Network Processor for Inference, or NNP-I. The company also announced that they will have a Neural Network Processor for Training, codenamed “Spring Crest,” available later this year. Nervana Engine was originally being developed on 28nm technology, with plans to move to 14nm before launch. Intel hasn’t said at this point, but we infer that the devices delivered this year will be on Intel’s 14nm FinFET technology, probably moving to 10nm sometime in the future.

Intel says Nervana is being developed in conjunction with Facebook, which is an interesting note because Facebook is the “super seven” data center company whose acceleration strategy has been most opaque. Google has developed their own processor, and Microsoft, Amazon/AWS and others have invested heavily in FPGA-based acceleration. Having Facebook as a development partner should give Nervana solid end-to-end credentials when it begins shipping broadly later this year.

Neural network training and inference are extremely compute-intensive, involving matrix multiplication of tensors and convolution. For years, graphics processing units (GPUs) have been the go-to solution for AI training acceleration, and FPGAs have worked hard to carve out a competitive niche in the inferencing game. As off-the-shelf chips go, GPUs are well suited to AI tasks, taking advantage of their highly parallel vector and linear algebra capabilities. But, because GPUs aren’t designed specifically for AI tasks, they still leave a lot on the table when it comes to architectural optimization for AI and deep learning.

Similarly, FPGAs can deliver incredible parallelism and performance on a miserly power budget for inferencing tasks which (unlike training) can be accomplished with reduced-precision fixed-point computations. Large data center and cloud installations have begun to take advantage of clusters of FPGAs for accelerating inferencing tasks, with remarkable results in terms of throughput, latency, and computational power efficiency. However, similar to GPUs, FPGAs were not designed specifically for AI, and there is a lot of hardware on a typical FPGA that is not involved in AI operations, and a number of architectural assumptions that make FPGAs great as general-purpose devices but suboptimal as AI processors.

Nervana came at the problem from their perspective as developers of GPU kernels for deep learning, which gave them tremendous insight into the limitations of GPUs for AI tasks. The company says that the Nervana engine was designed from a clean slate, discarding the GPU architecture and starting fresh. They analyzed a number of deep neural networks and came up with what they believed to be the best architecture for their key operations. They also came up with a new numerical format – dubbed FlexPoint, that tries to maximize the precision that can be stored within 16 bits.

Because AI computations can be extremely memory intensive, Nervana needed to be able to move a lot of data quickly. The Nervana device includes 32GB of in-package High Bandwidth Memory (HBM) that delivers incredibly high-capacity speed. The company claims 8 terabits per second of memory access bandwidth. HBM memories achieve high capacity by die-stacking. A single HBM chip stack can store 8GB of data with a stack of eight individual 1GB memory dies. The Nervana Engine includes four HBM stacks, providing 32GB in-package storage.  Intel’s multi-die packaging technology connects the HBM to the array of processing cores. Again, Intel hasn’t said, but we assume this to be done with Intel’s 2.5D Embedded Multi-Die Interconnect Bridge (EMIB) technology (rather than the newly announced FOVEROS 3D packaging.)

The Nervana Engine is composed of an array of “Tensor Processing Cores”  surrounded by HBM chiplets, memory interfaces, and high-speed IOs, which are designed to allow many Nervana devices to be combined to provide very large scale network implementations. Intel hasn’t given specific performance or power consumption figures for the new devices except to say that power consumption will be in the “hundreds of watts” – which puts Nervana clearly in the data center (compared with edge-targeted AI devices such as the company’s Movidius and Mobileye offerings).

The device includes six bi-directional high-bandwidth links, which the company says enables chips to be “interconnected within or between chassis in a seamless fashion.” The company says this “enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their models to unprecedented sizes without any decrease in speed.” Multiple devices connected together can act as one large processor.

Nervana seems to be aimed at GPUs’ and FPGAs’ increasing foothold as AI accelerators in the data center. Since Intel has some of the best FPGA technology in the world in their PSG division (formerly Altera), it would appear that the company thinks Nervana brings significant advantages over FPGAs in inferencing, and over GPUs in training. NVIDIA, in particular, has dominated the data center acceleration game for AI training and is obviously directly in Nervana’s crosshairs. It will be interesting to watch what happens as more purpose-build AI devices come on the market to challenge the current crop of general-purpose accelerators filling the gap in AI processing demand.

2 thoughts on “Intel Achieves AI Nervana”

  1. Davies made some critical reflections on “Deep Learning” slaming leCun … an interesting read is https://www.zdnet.com/article/intels-neuro-guru-slams-deep-learning-its-not-actually-learning/

    “Backpropogation doesn’t correlate to the brain,” insists Mike Davies, head of Intel’s neuromorphic computing unit, dismissing one of the key tools of the species of A.I. In vogue today, deep learning. “For that reason, “it’s really an optimizations procedure, it’s not actually learning.”

Leave a Reply

featured blogs
Oct 21, 2020
You've traveled back in time 65 million years with no way to return. What evidence can you leave to ensure future humans will know of your existence?...
Oct 21, 2020
We'€™re concluding the Online Training Deep Dive blog series, which has been taking the top 15 Online Training courses among students and professors and breaking them down into their different... [[ Click on the title to access the full blog on the Cadence Community site. ...
Oct 20, 2020
In 2020, mobile traffic has skyrocketed everywhere as our planet battles a pandemic. Samtec.com saw nearly double the mobile traffic in the first two quarters than it normally sees. While these levels have dropped off from their peaks in the spring, they have not returned to ...
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured Paper

New package technology improves EMI and thermal performance with smaller solution size

Sponsored by Texas Instruments

Power supply designers have a new tool in their effort to achieve balance between efficiency, size, and thermal performance with DC/DC power modules. The Enhanced HotRod™ QFN package technology from Texas Instruments enables engineers to address design challenges with an easy-to-use footprint that resembles a standard QFN. This new package type combines the advantages of flip-chip-on-lead with the improved thermal performance presented by a large thermal die attach pad (DAP).

Click here to download the whitepaper

Featured Chalk Talk

Electrification of the Vehicle

Sponsored by Mouser Electronics and KEMET

The automotive technology revolution has arrived, and with it - new demands on components for automotive applications. Electric vehicles, ADAS, connected cars, and autonomous driving put fresh demands on our electrical and electronic parts. In this episode of Chalk Talk, Amelia Dalton chats with Nick Stephen of KEMET about components for the next generation of automobiles.

More information about KEMET Electronics ALA7D & ALA8D Snap-In Capacitors