feature article
Subscribe Now

Intel Announces Stratix 10 NX

AI-specific FPGAs Target Inference

Intel has announced what they call their “First Intel AI-Optimized FPGA,” the Stratix 10 NX family. The company says these FPGAs “will offer customers customizable, reconfigurable and scalable AI acceleration for compute-demanding applications such as natural language processing and fraud detection.” Intel has bet on all the horses in the AI race, adding “Deep Learning Boost (DL Boost) to their flagship Xeon processors to dramatically accelerate AI inference in conventional data center processors, but also investing heavily in acceleration strategies such as FPGAs, acquisition of Habana Labs, Nervana (whose technology has now been reportedly dropped in favor of Habana Labs technology), and Movidius (focused on low-end AI inference).

Intel’s strategy is clearly morphing toward heterogeneous computing in the data center, where variations in workloads can have a dramatic impact on the optimal hardware configuration. For many (if not most) data center applications, the performance of Xeon with DL Boost will be adequate for the AI tasks that come along. However, if there is a heavy load of AI demand combined with low latency requirements, it makes more sense to add workload-specific acceleration. This is where Stratix 10 NX is likely to come into play.

This new Stratix 10 NX family is a continuation of the Stratix 10 line introduced back in 2015 when Intel had just announced “plans” to buy Altera. Stratix 10 is fabricated on Intel’s 14nm 3D tri-gate (FinFET) technology, and it makes extensive use of Intel’s proprietary embedded multi-die interconnect bridge (EMIB) technology to allow the company to deploy a wide variety of devices and device families for specialized application domains by mixing and matching chiplets in a single package. EMIB is Intel’s alternative to a silicon interposer for  in-package high density interconnect of heterogeneous chips.

During the wait for the deployment of the next-generation Agilex family (which is based on Intel’s delayed 10nm process and began shipping to early-access customers last August), Intel has rolled out several new variants of Stratix 10 by taking advantage of EMIB’s ability to combine chiplets into domain-focused solutions. Stratix 10 now consists of six different variants, GX – which are general-purpose FPGAs, SX – which is Intel’s SoC FPGA that includes hard processor subsystems with 64-bit quad-core ARM Cortex-A53s, TX – which is the transceiver-heavy variant with tons of PAM4 57.8 Gbps transceivers, MX – which includes in-package HBM2, DX – which supports Intel Ultra Path Interconnect (Intel UPI) for direct coherent connection to future select Intel Xeon processors, and now NX – which emphasizes low-precision computation.

In the case of the new Stratix 10 NX, the company is going after the AI inference market primarily via new AI-optimized arithmetic blocks called AI Tensor Blocks. These blocks would have previously been called “DSP” blocks, but the new versions contain dense arrays of lower-precision multipliers typically used for AI model arithmetic. Intel’s previous devices focused on higher-precision multiplication and even floating point, which would be useful in targeting AI training, but inference acceleration is all about low-precision, and Intel has answered with 15x more INT8 performance than the standard Stratix 10 DSP Block.

Intel gets this INT8 boost by swapping the usual 2-multiplier 2-accumulator architecture of the Stratix 10 MX block for a 30-multiplier, 30-accumulator block that can handle INT4, INT8, BLOCK FP12, and BLOCK FP16. Presumably, the “15x” factor is due to going from 2 MACs to 30 MACs in the same block for lower precision operations. Stratix 10 NX also boasts up to 16GB in-package stacked HBM (like the MX line) and PCIe Gen3x16 plus PCIe Gen4x16 support.

Intel says Stratix 10 NX is up to 2.3X faster than Nvidia V100 GPUs for BERT batch processing, 9.5X faster in LSTM batch processing, and 3.8X faster in ResNet50 batch processing. The targeting of NVidia in their marketing materials clearly illuminates Intel’s strategy for Stratix 10 NX. The company wants to stop NVidia’s incursion into the data center at any cost.

Of course, as we have discussed many times, the big challenge with taking advantage of FPGA performance and power efficiency in compute acceleration is the programming model. Creating an optimized accelerator using FPGAs traditionally requires a team with significant FPGA expertise and experience, and a lot of time. Intel and rival FPGA supplier Xilinx have worked hard over the past decade or so to improve that situation, and Intel seems to be hanging their hat on their ambitious “oneAPI” which is a standards-based, unified programming model that aims to facilitate integration of heterogeneous Xeon-based platforms with various accelerators such as FPGAs. Intel’s approach makes sense, given the breadth of their offering, and there is insufficient industry experience so far to fairly assess how oneAPI compares or competes with Xilinx’s VITIS, or with Nvidia’s CUDA environment. While the other solutions aim specifically at acceleration, Intel appears to be attacking the problem one level of abstraction higher, which could be a spectacular success, or could be a bridge too far.

Stratix 10 NX was announced as part of a larger Intel data center announcement, which included the debut of 3rd Gen Xeon processors with built-in AI acceleration through the integration of bfloat16 support. Bfloat16 has half the bits of FP32 for AI inference with comparable model accuracy. Also announced were the New Intel Optane persistent memory series and new 3D NAND SSDs. Taken together, the announcements show a steady drum beat of progress across the spectrum of data center AI workload optimization.

Intel is defending their data center dominance on multiple fronts these days, with strong pressure coming from acceleration providers such as Nvidia and Xilinx, alternative processors and architectures such as AMD and ARM, and a proliferation of standards-based approaches that minimize the company’s ability to hold off competition with a breadth-first and integration strategy. It will be interesting to watch the next few years unfold as the increasingly lucrative data center market attracts even more, and better-funded, attackers.

2 thoughts on “Intel Announces Stratix 10 NX”

  1. BLOCK FP12, and BLOCK FP16 … I haven’t seen Intel support block FP before, other than in their early Nervana designs. Is this something novel for FPGA FP ai processing?

Leave a Reply

featured blogs
Apr 16, 2024
Learn what IR Drop is, explore the chip design tools and techniques involved in power network analysis, and see how it accelerates the IC design flow.The post Leveraging Early Power Network Analysis to Accelerate Chip Design appeared first on Chip Design....
Apr 16, 2024
In today's semiconductor era, every minute, you always look for the opportunity to enhance your skills and learning growth and want to keep up to date with the technology. This could mean you would also like to get hold of the small concepts behind the complex chip desig...
Mar 30, 2024
Join me on a brief stream-of-consciousness tour to see what it's like to live inside (what I laughingly call) my mind...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023
16,482 views