feature article
Subscribe Now

Intel Achieves AI Nervana

All Apologies to GPUs and FPGAs

At CES 2019 in Las Vegas this week, Navin Shenoy – Intel Data Center Group executive vice president, announced the Intel Nervana Neural Network Processor for Inference, which will go into production this year. Back in 2016, Intel acquired Nervana, a 48-person AI SAAS startup from San Diego, for (reportedly) something like $408 million. Nervana was a software company at the time, providing a full-stack software-as-a-service platform called Nervana Cloud, based on an open-source framework called Neon (that rivaled Caffe, Tensorflow, and others), enabling the development of custom deep learning applications.

Nervana was also reportedly working on the development of a custom chip for neural network processing at the time, which they claimed would outperform GPUs as AI accelerators by a factor of at least ten. Of course, developing a custom processor is a tall order for a small software team, but that ambition was made dramatically more realistic with their acquisition by Intel. Now, Intel is announcing the delivery of the first part of that vision – the Intel Nervana Neural Network Processor for Inference, or NNP-I. The company also announced that they will have a Neural Network Processor for Training, codenamed “Spring Crest,” available later this year. Nervana Engine was originally being developed on 28nm technology, with plans to move to 14nm before launch. Intel hasn’t said at this point, but we infer that the devices delivered this year will be on Intel’s 14nm FinFET technology, probably moving to 10nm sometime in the future.

Intel says Nervana is being developed in conjunction with Facebook, which is an interesting note because Facebook is the “super seven” data center company whose acceleration strategy has been most opaque. Google has developed their own processor, and Microsoft, Amazon/AWS and others have invested heavily in FPGA-based acceleration. Having Facebook as a development partner should give Nervana solid end-to-end credentials when it begins shipping broadly later this year.

Neural network training and inference are extremely compute-intensive, involving matrix multiplication of tensors and convolution. For years, graphics processing units (GPUs) have been the go-to solution for AI training acceleration, and FPGAs have worked hard to carve out a competitive niche in the inferencing game. As off-the-shelf chips go, GPUs are well suited to AI tasks, taking advantage of their highly parallel vector and linear algebra capabilities. But, because GPUs aren’t designed specifically for AI tasks, they still leave a lot on the table when it comes to architectural optimization for AI and deep learning.

Similarly, FPGAs can deliver incredible parallelism and performance on a miserly power budget for inferencing tasks which (unlike training) can be accomplished with reduced-precision fixed-point computations. Large data center and cloud installations have begun to take advantage of clusters of FPGAs for accelerating inferencing tasks, with remarkable results in terms of throughput, latency, and computational power efficiency. However, similar to GPUs, FPGAs were not designed specifically for AI, and there is a lot of hardware on a typical FPGA that is not involved in AI operations, and a number of architectural assumptions that make FPGAs great as general-purpose devices but suboptimal as AI processors.

Nervana came at the problem from their perspective as developers of GPU kernels for deep learning, which gave them tremendous insight into the limitations of GPUs for AI tasks. The company says that the Nervana engine was designed from a clean slate, discarding the GPU architecture and starting fresh. They analyzed a number of deep neural networks and came up with what they believed to be the best architecture for their key operations. They also came up with a new numerical format – dubbed FlexPoint, that tries to maximize the precision that can be stored within 16 bits.

Because AI computations can be extremely memory intensive, Nervana needed to be able to move a lot of data quickly. The Nervana device includes 32GB of in-package High Bandwidth Memory (HBM) that delivers incredibly high-capacity speed. The company claims 8 terabits per second of memory access bandwidth. HBM memories achieve high capacity by die-stacking. A single HBM chip stack can store 8GB of data with a stack of eight individual 1GB memory dies. The Nervana Engine includes four HBM stacks, providing 32GB in-package storage.  Intel’s multi-die packaging technology connects the HBM to the array of processing cores. Again, Intel hasn’t said, but we assume this to be done with Intel’s 2.5D Embedded Multi-Die Interconnect Bridge (EMIB) technology (rather than the newly announced FOVEROS 3D packaging.)

The Nervana Engine is composed of an array of “Tensor Processing Cores”  surrounded by HBM chiplets, memory interfaces, and high-speed IOs, which are designed to allow many Nervana devices to be combined to provide very large scale network implementations. Intel hasn’t given specific performance or power consumption figures for the new devices except to say that power consumption will be in the “hundreds of watts” – which puts Nervana clearly in the data center (compared with edge-targeted AI devices such as the company’s Movidius and Mobileye offerings).

The device includes six bi-directional high-bandwidth links, which the company says enables chips to be “interconnected within or between chassis in a seamless fashion.” The company says this “enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their models to unprecedented sizes without any decrease in speed.” Multiple devices connected together can act as one large processor.

Nervana seems to be aimed at GPUs’ and FPGAs’ increasing foothold as AI accelerators in the data center. Since Intel has some of the best FPGA technology in the world in their PSG division (formerly Altera), it would appear that the company thinks Nervana brings significant advantages over FPGAs in inferencing, and over GPUs in training. NVIDIA, in particular, has dominated the data center acceleration game for AI training and is obviously directly in Nervana’s crosshairs. It will be interesting to watch what happens as more purpose-build AI devices come on the market to challenge the current crop of general-purpose accelerators filling the gap in AI processing demand.

2 thoughts on “Intel Achieves AI Nervana”

  1. Davies made some critical reflections on “Deep Learning” slaming leCun … an interesting read is https://www.zdnet.com/article/intels-neuro-guru-slams-deep-learning-its-not-actually-learning/

    “Backpropogation doesn’t correlate to the brain,” insists Mike Davies, head of Intel’s neuromorphic computing unit, dismissing one of the key tools of the species of A.I. In vogue today, deep learning. “For that reason, “it’s really an optimizations procedure, it’s not actually learning.”

Leave a Reply

featured blogs
Apr 26, 2024
LEGO ® is the world's most famous toy brand. The experience of playing with these toys has endured over the years because of the innumerable possibilities they allow us: from simple textbook models to wherever our imagination might take us. We have always been driven by ...
Apr 26, 2024
Biological-inspired developments result in LEDs that are 55% brighter, but 55% brighter than what?...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Unlock the Productivity and Efficiency of a Connected Plant
In this episode of Chalk Talk, Amelia Dalton and Patrick Casey from Schneider Electric explore the multitude of benefits that mobility brings to industrial applications. They investigate how Schneider Electric’s Harmony Hub can simplify monitoring and testing, increase operational efficiency and connectivity openness in industrial plants, and how NFC technology can bring new innovation possibilities to IIoT applications.
Apr 23, 2024
819 views