feature article
Subscribe Now

Intel Achieves AI Nervana

All Apologies to GPUs and FPGAs

At CES 2019 in Las Vegas this week, Navin Shenoy – Intel Data Center Group executive vice president, announced the Intel Nervana Neural Network Processor for Inference, which will go into production this year. Back in 2016, Intel acquired Nervana, a 48-person AI SAAS startup from San Diego, for (reportedly) something like $408 million. Nervana was a software company at the time, providing a full-stack software-as-a-service platform called Nervana Cloud, based on an open-source framework called Neon (that rivaled Caffe, Tensorflow, and others), enabling the development of custom deep learning applications.

Nervana was also reportedly working on the development of a custom chip for neural network processing at the time, which they claimed would outperform GPUs as AI accelerators by a factor of at least ten. Of course, developing a custom processor is a tall order for a small software team, but that ambition was made dramatically more realistic with their acquisition by Intel. Now, Intel is announcing the delivery of the first part of that vision – the Intel Nervana Neural Network Processor for Inference, or NNP-I. The company also announced that they will have a Neural Network Processor for Training, codenamed “Spring Crest,” available later this year. Nervana Engine was originally being developed on 28nm technology, with plans to move to 14nm before launch. Intel hasn’t said at this point, but we infer that the devices delivered this year will be on Intel’s 14nm FinFET technology, probably moving to 10nm sometime in the future.

Intel says Nervana is being developed in conjunction with Facebook, which is an interesting note because Facebook is the “super seven” data center company whose acceleration strategy has been most opaque. Google has developed their own processor, and Microsoft, Amazon/AWS and others have invested heavily in FPGA-based acceleration. Having Facebook as a development partner should give Nervana solid end-to-end credentials when it begins shipping broadly later this year.

Neural network training and inference are extremely compute-intensive, involving matrix multiplication of tensors and convolution. For years, graphics processing units (GPUs) have been the go-to solution for AI training acceleration, and FPGAs have worked hard to carve out a competitive niche in the inferencing game. As off-the-shelf chips go, GPUs are well suited to AI tasks, taking advantage of their highly parallel vector and linear algebra capabilities. But, because GPUs aren’t designed specifically for AI tasks, they still leave a lot on the table when it comes to architectural optimization for AI and deep learning.

Similarly, FPGAs can deliver incredible parallelism and performance on a miserly power budget for inferencing tasks which (unlike training) can be accomplished with reduced-precision fixed-point computations. Large data center and cloud installations have begun to take advantage of clusters of FPGAs for accelerating inferencing tasks, with remarkable results in terms of throughput, latency, and computational power efficiency. However, similar to GPUs, FPGAs were not designed specifically for AI, and there is a lot of hardware on a typical FPGA that is not involved in AI operations, and a number of architectural assumptions that make FPGAs great as general-purpose devices but suboptimal as AI processors.

Nervana came at the problem from their perspective as developers of GPU kernels for deep learning, which gave them tremendous insight into the limitations of GPUs for AI tasks. The company says that the Nervana engine was designed from a clean slate, discarding the GPU architecture and starting fresh. They analyzed a number of deep neural networks and came up with what they believed to be the best architecture for their key operations. They also came up with a new numerical format – dubbed FlexPoint, that tries to maximize the precision that can be stored within 16 bits.

Because AI computations can be extremely memory intensive, Nervana needed to be able to move a lot of data quickly. The Nervana device includes 32GB of in-package High Bandwidth Memory (HBM) that delivers incredibly high-capacity speed. The company claims 8 terabits per second of memory access bandwidth. HBM memories achieve high capacity by die-stacking. A single HBM chip stack can store 8GB of data with a stack of eight individual 1GB memory dies. The Nervana Engine includes four HBM stacks, providing 32GB in-package storage.  Intel’s multi-die packaging technology connects the HBM to the array of processing cores. Again, Intel hasn’t said, but we assume this to be done with Intel’s 2.5D Embedded Multi-Die Interconnect Bridge (EMIB) technology (rather than the newly announced FOVEROS 3D packaging.)

The Nervana Engine is composed of an array of “Tensor Processing Cores”  surrounded by HBM chiplets, memory interfaces, and high-speed IOs, which are designed to allow many Nervana devices to be combined to provide very large scale network implementations. Intel hasn’t given specific performance or power consumption figures for the new devices except to say that power consumption will be in the “hundreds of watts” – which puts Nervana clearly in the data center (compared with edge-targeted AI devices such as the company’s Movidius and Mobileye offerings).

The device includes six bi-directional high-bandwidth links, which the company says enables chips to be “interconnected within or between chassis in a seamless fashion.” The company says this “enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their models to unprecedented sizes without any decrease in speed.” Multiple devices connected together can act as one large processor.

Nervana seems to be aimed at GPUs’ and FPGAs’ increasing foothold as AI accelerators in the data center. Since Intel has some of the best FPGA technology in the world in their PSG division (formerly Altera), it would appear that the company thinks Nervana brings significant advantages over FPGAs in inferencing, and over GPUs in training. NVIDIA, in particular, has dominated the data center acceleration game for AI training and is obviously directly in Nervana’s crosshairs. It will be interesting to watch what happens as more purpose-build AI devices come on the market to challenge the current crop of general-purpose accelerators filling the gap in AI processing demand.

2 thoughts on “Intel Achieves AI Nervana”

  1. Davies made some critical reflections on “Deep Learning” slaming leCun … an interesting read is https://www.zdnet.com/article/intels-neuro-guru-slams-deep-learning-its-not-actually-learning/

    “Backpropogation doesn’t correlate to the brain,” insists Mike Davies, head of Intel’s neuromorphic computing unit, dismissing one of the key tools of the species of A.I. In vogue today, deep learning. “For that reason, “it’s really an optimizations procedure, it’s not actually learning.”

Leave a Reply

featured blogs
Jun 22, 2021
Have you ever been in a situation where the run has started and you realize that you needed to add two more workers, or drop a couple of them? In such cases, you wait for the run to complete, make... [[ Click on the title to access the full blog on the Cadence Community site...
Jun 21, 2021
By James Paris Last Saturday was my son's birthday and we had many things to… The post Time is money'¦so why waste it on bad data? appeared first on Design with Calibre....
Jun 17, 2021
Learn how cloud-based SoC design and functional verification systems such as ZeBu Cloud accelerate networking SoC readiness across both hardware & software. The post The Quest for the Most Advanced Networking SoC: Achieving Breakthrough Verification Efficiency with Clou...
Jun 17, 2021
In today’s blog episode, we would like to introduce our newest White Paper: “System and Component qualifications of VPX solutions, Create a novel, low-cost, easy to build, high reliability test platform for VPX modules“. Over the past year, Samtec has worked...

featured video

Reduce Analog and Mixed-Signal Design Risk with a Unified Design and Simulation Solution

Sponsored by Cadence Design Systems

Learn how you can reduce your cost and risk with the Virtuoso and Spectre unified analog and mixed-signal design and simulation solution, offering accuracy, capacity, and high performance.

Click here for more information about Spectre FX Simulator

featured paper

What is a Hall-effect sensor?

Sponsored by Texas Instruments

Are you considering a Hall-effect sensor for your next design? Read this technical article to learn how Hall-effect sensors work to accurately measure position, distance and movement. In this article, you’ll gain insight into Hall-effect sensing theory, topologies, common use cases and the different types of Hall-effect sensors available today: Hall-effect switches, latches and linear sensors.

Click to read more

featured chalk talk

Traveo II Microcontrollers for Automotive Solutions

Sponsored by Mouser Electronics and Infineon

Today’s automotive designs are more complicated than ever, with a slew of safety requirements, internal memory considerations, and complicated power issues to consider. In this episode of Chalk Talk, Amelia Dalton chats with Marcelo Williams Silva from Infineon about the Traveo™ II Microcontrollers that deal with all of these automotive-related challenges with ease. Amelia and Marcelo take a closer look at how the power efficiency, smart IO signal paths, and over the air firmware updates included with this new MCU family will make all the time-saving difference in your next automotive design.

Click here for more information about Cypress Semiconductor Traveo™ II 32-bit Arm Automotive MCUs