feature article
Subscribe Now

Intel Achieves AI Nervana

All Apologies to GPUs and FPGAs

At CES 2019 in Las Vegas this week, Navin Shenoy – Intel Data Center Group executive vice president, announced the Intel Nervana Neural Network Processor for Inference, which will go into production this year. Back in 2016, Intel acquired Nervana, a 48-person AI SAAS startup from San Diego, for (reportedly) something like $408 million. Nervana was a software company at the time, providing a full-stack software-as-a-service platform called Nervana Cloud, based on an open-source framework called Neon (that rivaled Caffe, Tensorflow, and others), enabling the development of custom deep learning applications.

Nervana was also reportedly working on the development of a custom chip for neural network processing at the time, which they claimed would outperform GPUs as AI accelerators by a factor of at least ten. Of course, developing a custom processor is a tall order for a small software team, but that ambition was made dramatically more realistic with their acquisition by Intel. Now, Intel is announcing the delivery of the first part of that vision – the Intel Nervana Neural Network Processor for Inference, or NNP-I. The company also announced that they will have a Neural Network Processor for Training, codenamed “Spring Crest,” available later this year. Nervana Engine was originally being developed on 28nm technology, with plans to move to 14nm before launch. Intel hasn’t said at this point, but we infer that the devices delivered this year will be on Intel’s 14nm FinFET technology, probably moving to 10nm sometime in the future.

Intel says Nervana is being developed in conjunction with Facebook, which is an interesting note because Facebook is the “super seven” data center company whose acceleration strategy has been most opaque. Google has developed their own processor, and Microsoft, Amazon/AWS and others have invested heavily in FPGA-based acceleration. Having Facebook as a development partner should give Nervana solid end-to-end credentials when it begins shipping broadly later this year.

Neural network training and inference are extremely compute-intensive, involving matrix multiplication of tensors and convolution. For years, graphics processing units (GPUs) have been the go-to solution for AI training acceleration, and FPGAs have worked hard to carve out a competitive niche in the inferencing game. As off-the-shelf chips go, GPUs are well suited to AI tasks, taking advantage of their highly parallel vector and linear algebra capabilities. But, because GPUs aren’t designed specifically for AI tasks, they still leave a lot on the table when it comes to architectural optimization for AI and deep learning.

Similarly, FPGAs can deliver incredible parallelism and performance on a miserly power budget for inferencing tasks which (unlike training) can be accomplished with reduced-precision fixed-point computations. Large data center and cloud installations have begun to take advantage of clusters of FPGAs for accelerating inferencing tasks, with remarkable results in terms of throughput, latency, and computational power efficiency. However, similar to GPUs, FPGAs were not designed specifically for AI, and there is a lot of hardware on a typical FPGA that is not involved in AI operations, and a number of architectural assumptions that make FPGAs great as general-purpose devices but suboptimal as AI processors.

Nervana came at the problem from their perspective as developers of GPU kernels for deep learning, which gave them tremendous insight into the limitations of GPUs for AI tasks. The company says that the Nervana engine was designed from a clean slate, discarding the GPU architecture and starting fresh. They analyzed a number of deep neural networks and came up with what they believed to be the best architecture for their key operations. They also came up with a new numerical format – dubbed FlexPoint, that tries to maximize the precision that can be stored within 16 bits.

Because AI computations can be extremely memory intensive, Nervana needed to be able to move a lot of data quickly. The Nervana device includes 32GB of in-package High Bandwidth Memory (HBM) that delivers incredibly high-capacity speed. The company claims 8 terabits per second of memory access bandwidth. HBM memories achieve high capacity by die-stacking. A single HBM chip stack can store 8GB of data with a stack of eight individual 1GB memory dies. The Nervana Engine includes four HBM stacks, providing 32GB in-package storage.  Intel’s multi-die packaging technology connects the HBM to the array of processing cores. Again, Intel hasn’t said, but we assume this to be done with Intel’s 2.5D Embedded Multi-Die Interconnect Bridge (EMIB) technology (rather than the newly announced FOVEROS 3D packaging.)

The Nervana Engine is composed of an array of “Tensor Processing Cores”  surrounded by HBM chiplets, memory interfaces, and high-speed IOs, which are designed to allow many Nervana devices to be combined to provide very large scale network implementations. Intel hasn’t given specific performance or power consumption figures for the new devices except to say that power consumption will be in the “hundreds of watts” – which puts Nervana clearly in the data center (compared with edge-targeted AI devices such as the company’s Movidius and Mobileye offerings).

The device includes six bi-directional high-bandwidth links, which the company says enables chips to be “interconnected within or between chassis in a seamless fashion.” The company says this “enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their models to unprecedented sizes without any decrease in speed.” Multiple devices connected together can act as one large processor.

Nervana seems to be aimed at GPUs’ and FPGAs’ increasing foothold as AI accelerators in the data center. Since Intel has some of the best FPGA technology in the world in their PSG division (formerly Altera), it would appear that the company thinks Nervana brings significant advantages over FPGAs in inferencing, and over GPUs in training. NVIDIA, in particular, has dominated the data center acceleration game for AI training and is obviously directly in Nervana’s crosshairs. It will be interesting to watch what happens as more purpose-build AI devices come on the market to challenge the current crop of general-purpose accelerators filling the gap in AI processing demand.

2 thoughts on “Intel Achieves AI Nervana”

  1. Davies made some critical reflections on “Deep Learning” slaming leCun … an interesting read is https://www.zdnet.com/article/intels-neuro-guru-slams-deep-learning-its-not-actually-learning/

    “Backpropogation doesn’t correlate to the brain,” insists Mike Davies, head of Intel’s neuromorphic computing unit, dismissing one of the key tools of the species of A.I. In vogue today, deep learning. “For that reason, “it’s really an optimizations procedure, it’s not actually learning.”

Leave a Reply

featured blogs
Jul 1, 2022
We all look for 100% perfection and want to turn our dreams (expectations) into reality as far as we can. Are you also looking for a magic wand to turn expectation into reality? The story applies to... ...
Jun 30, 2022
Learn how AI-powered cameras and neural network image processing enable everything from smartphone portraits to machine vision and automotive safety features. The post How AI Helps Cameras See More Clearly appeared first on From Silicon To Software....
Jun 28, 2022
Watching this video caused me to wander off into the weeds looking at a weird and wonderful collection of wheeled implementations....

featured video

Demo: Achronix Speedster7t 2D NoC vs. Traditional FPGA Routing

Sponsored by Achronix

This demonstration compares an FPGA design utilizing Achronix Speedster7t 2D Network on Chip (NoC) for routing signals with the FPGA device, versus using traditional FPGA routing. The 2D NoC provides a 40% reduction in logic resources required with 40% less compile time needed versus using traditional FPGA routing. Speedster7t FPGAs are optimized for high-bandwidth workloads and eliminate the performance bottlenecks associated with traditional FPGAs.

Subscribe to Achronix's YouTube channel for the latest videos on how to accelerate your data using FPGAs and eFPGA IP

featured paper

Addressing high-voltage design challenges with reliable and affordable isolation tech

Sponsored by Texas Instruments

Check out TI’s new white paper for an overview of galvanic isolation techniques, as well as how to improve isolated designs in electric vehicles, grid infrastructure, factory automation and motor drives.

Click to read more

featured chalk talk

Har-Modular for PCB Connectivity

Sponsored by Mouser Electronics and HARTING

Did you know that you can create custom modular connector solutions from off the shelf components that are robust, save PCB space and are easy to assemble? In this episode of Chalk Talk, Amelia Dalton chats with Phill Shaw and Nazario Biala from HARTING about the Har-Modular PCB connector system that gives you over a billion combination possibilities for data, signal and power.

Click here for more information about HARTING har-modular PCB Connectors