feature article
Subscribe Now

Intel Announces Stratix 10 NX

AI-specific FPGAs Target Inference

Intel has announced what they call their “First Intel AI-Optimized FPGA,” the Stratix 10 NX family. The company says these FPGAs “will offer customers customizable, reconfigurable and scalable AI acceleration for compute-demanding applications such as natural language processing and fraud detection.” Intel has bet on all the horses in the AI race, adding “Deep Learning Boost (DL Boost) to their flagship Xeon processors to dramatically accelerate AI inference in conventional data center processors, but also investing heavily in acceleration strategies such as FPGAs, acquisition of Habana Labs, Nervana (whose technology has now been reportedly dropped in favor of Habana Labs technology), and Movidius (focused on low-end AI inference).

Intel’s strategy is clearly morphing toward heterogeneous computing in the data center, where variations in workloads can have a dramatic impact on the optimal hardware configuration. For many (if not most) data center applications, the performance of Xeon with DL Boost will be adequate for the AI tasks that come along. However, if there is a heavy load of AI demand combined with low latency requirements, it makes more sense to add workload-specific acceleration. This is where Stratix 10 NX is likely to come into play.

This new Stratix 10 NX family is a continuation of the Stratix 10 line introduced back in 2015 when Intel had just announced “plans” to buy Altera. Stratix 10 is fabricated on Intel’s 14nm 3D tri-gate (FinFET) technology, and it makes extensive use of Intel’s proprietary embedded multi-die interconnect bridge (EMIB) technology to allow the company to deploy a wide variety of devices and device families for specialized application domains by mixing and matching chiplets in a single package. EMIB is Intel’s alternative to a silicon interposer for  in-package high density interconnect of heterogeneous chips.

During the wait for the deployment of the next-generation Agilex family (which is based on Intel’s delayed 10nm process and began shipping to early-access customers last August), Intel has rolled out several new variants of Stratix 10 by taking advantage of EMIB’s ability to combine chiplets into domain-focused solutions. Stratix 10 now consists of six different variants, GX – which are general-purpose FPGAs, SX – which is Intel’s SoC FPGA that includes hard processor subsystems with 64-bit quad-core ARM Cortex-A53s, TX – which is the transceiver-heavy variant with tons of PAM4 57.8 Gbps transceivers, MX – which includes in-package HBM2, DX – which supports Intel Ultra Path Interconnect (Intel UPI) for direct coherent connection to future select Intel Xeon processors, and now NX – which emphasizes low-precision computation.

In the case of the new Stratix 10 NX, the company is going after the AI inference market primarily via new AI-optimized arithmetic blocks called AI Tensor Blocks. These blocks would have previously been called “DSP” blocks, but the new versions contain dense arrays of lower-precision multipliers typically used for AI model arithmetic. Intel’s previous devices focused on higher-precision multiplication and even floating point, which would be useful in targeting AI training, but inference acceleration is all about low-precision, and Intel has answered with 15x more INT8 performance than the standard Stratix 10 DSP Block.

Intel gets this INT8 boost by swapping the usual 2-multiplier 2-accumulator architecture of the Stratix 10 MX block for a 30-multiplier, 30-accumulator block that can handle INT4, INT8, BLOCK FP12, and BLOCK FP16. Presumably, the “15x” factor is due to going from 2 MACs to 30 MACs in the same block for lower precision operations. Stratix 10 NX also boasts up to 16GB in-package stacked HBM (like the MX line) and PCIe Gen3x16 plus PCIe Gen4x16 support.

Intel says Stratix 10 NX is up to 2.3X faster than Nvidia V100 GPUs for BERT batch processing, 9.5X faster in LSTM batch processing, and 3.8X faster in ResNet50 batch processing. The targeting of NVidia in their marketing materials clearly illuminates Intel’s strategy for Stratix 10 NX. The company wants to stop NVidia’s incursion into the data center at any cost.

Of course, as we have discussed many times, the big challenge with taking advantage of FPGA performance and power efficiency in compute acceleration is the programming model. Creating an optimized accelerator using FPGAs traditionally requires a team with significant FPGA expertise and experience, and a lot of time. Intel and rival FPGA supplier Xilinx have worked hard over the past decade or so to improve that situation, and Intel seems to be hanging their hat on their ambitious “oneAPI” which is a standards-based, unified programming model that aims to facilitate integration of heterogeneous Xeon-based platforms with various accelerators such as FPGAs. Intel’s approach makes sense, given the breadth of their offering, and there is insufficient industry experience so far to fairly assess how oneAPI compares or competes with Xilinx’s VITIS, or with Nvidia’s CUDA environment. While the other solutions aim specifically at acceleration, Intel appears to be attacking the problem one level of abstraction higher, which could be a spectacular success, or could be a bridge too far.

Stratix 10 NX was announced as part of a larger Intel data center announcement, which included the debut of 3rd Gen Xeon processors with built-in AI acceleration through the integration of bfloat16 support. Bfloat16 has half the bits of FP32 for AI inference with comparable model accuracy. Also announced were the New Intel Optane persistent memory series and new 3D NAND SSDs. Taken together, the announcements show a steady drum beat of progress across the spectrum of data center AI workload optimization.

Intel is defending their data center dominance on multiple fronts these days, with strong pressure coming from acceleration providers such as Nvidia and Xilinx, alternative processors and architectures such as AMD and ARM, and a proliferation of standards-based approaches that minimize the company’s ability to hold off competition with a breadth-first and integration strategy. It will be interesting to watch the next few years unfold as the increasingly lucrative data center market attracts even more, and better-funded, attackers.

2 thoughts on “Intel Announces Stratix 10 NX”

  1. BLOCK FP12, and BLOCK FP16 … I haven’t seen Intel support block FP before, other than in their early Nervana designs. Is this something novel for FPGA FP ai processing?

Leave a Reply

featured blogs
Aug 12, 2020
What if I told you I knew someone who could improve your regression efficiency: make fewer runs, spend less runtime on the runs you do make, and have the same coverage at the end? You'd say that... [[ Click on the title to access the full blog on the Cadence Community s...
Aug 12, 2020
Samtec has been selling its products online since the early 2000s, the very early days of eCommerce. We’ve been through a couple of shopping cart iterations since then. Before this recent upgrade, Samtec.com had been running on a cart system that was built in 2011. It w...
Aug 11, 2020
Making a person appear to say or do something they did not actually say or do has the potential to take the war of disinformation to a whole new level....
Aug 7, 2020
[From the last episode: We looked at activation and what they'€™re for.] We'€™ve talked about the structure of machine-learning (ML) models and much of the hardware and math needed to do ML work. But there are some practical considerations that mean we may not directly us...

Featured Video

Product Update: New DesignWare USB4 IP Solution

Sponsored by Synopsys

Are you ready for USB4? Join Gervais Fong and Eric Huang to learn more about this new 40Gbps standard and Synopsys DesignWare IP that helps bring your USB4-enabled SoC to market faster.

Click here for more information about DesignWare USB4 IP

Featured Paper

Improving Performance in High-Voltage Systems With Zero-Drift Hall-Effect Current Sensing

Sponsored by Texas Instruments

Learn how major industry trends are driving demands for isolated current sensing, and how new zero-drift Hall-effect current sensors can improve isolation and measurement drift while simplifying the design process.

Click here for more information

Featured Chalk Talk

Cloud Computing for Electronic Design (Are We There Yet?)

Sponsored by Cadence Design Systems

When your project is at crunch time, a shortage of server capacity can bring your schedule to a crawl. But, the rest of the year, having a bunch of extra servers sitting around idle can be extremely expensive. Cloud-based EDA lets you have exactly the compute resources you need, when you need them. In this episode of Chalk Talk, Amelia Dalton chats with Craig Johnson of Cadence Design Systems about Cadence’s cloud-based EDA solutions.

More information about the Cadence Cloud Portfolio