feature article
Subscribe Now

Xilinx Launches Versal HBM

Busting the Memory Bottleneck

It’s no secret that we are drowning in data. Today’s applications and algorithms require almost incomprehensible amounts of data, and that means the bandwidth requirements are exploding faster than networking and memory technologies can handle. Even with the most advanced accelerators we can build with our FPGAs, we can be choked trying to get data on and off the chip and finding places to store information as we are processing. 

Even though memory bandwidth has been increasing rapidly, the demand is growing faster. Pushing around zettabytes of information worldwide has stressed current technologies to the breaking point. Pushing performance-critical tasks off to FPGAs doesn’t help if the system is starved for memory bandwidth.

At the same time, more and more of that data needs to be secured, and every time data is moved across an interface, it becomes vulnerable.

What we need is to move the memory closer to the processing.

Xilinx has taken a big step toward memory localization with their new Versal HBM series of “ACAP” devices (we think of them as FPGAs). HBM (or high-bandwidth memory) is designed to sit in the same package with other processing elements, communicating via stacked-silicon interconnect (SSI) advanced packaging technology. By keeping the memory in-package, much higher-bandwidth connections are possible, and avoiding off-chip memory interfaces significantly reduces power consumption and interface latency.

This is far from Xilinx’s first rodeo with SSI. The company was a pioneer in silicon interposers with FPGA years ago, and this new device is built on fourth-generation SSI. Early on, SSI was used primarily to increase effective yield by packing several smaller FPGA chiplets into a single package to build a larger FPGA. But today, SSI is also used to make Xilinx’s silicon more scalable and versatile. To build Versal HBM, for example, they just swapped out one “Super Logic Region” (SLR) chiplet for an HBM2e stack from their Versal Premium device to build Versal HBM. (OK, it’s a little bit more complicated than that, but you get the idea.)

Compared with external DDR5, in-package HBM offers 8x the bandwidth at 63% lower power. And that’s a big deal. Parking an HBM stack inside your FPGA gives you a memory bandwidth bonanza, while saving your power budget for processing. 

This is not the first time Xilinx has popped HBM into one of their devices.  One version of their previous-generation Virtex Ultrascale+ FPGAs featured in-package HBM. The new Versal HBM outperforms that one in every axis, however, with 1.8x the memory bandwidth (from 460Gbps to 820Gbps) at 15% lower power and 2x the HBM memory capacity (32GB vs 16GB).

Versal HBM has a lot more than just more memory bandwidth, though. They’ve also significantly increased the size of the SerDes pipes for getting data on and off the device, doubling the total bandwidth to a mind-bending 5.6Tb/s. The SerDes is scalable for maximum application flexibility, with 32Gbps NRZ for power-optimized 100G interfaces, 58Gbps PAM4 for the current 400G ramp and deployment, and super-sporty 112Gbps PAM4 for future 800 gig network development on 100G per lane optics. 

Many standard interfaces are pre-built and hardened for you, including 2.4Tb/s of scalable Ethernet bandwidth that offers multi-rate: 400/200/100/50/40/25/10G with FEC, and multi-standard: FlexE, Flex-O, eCPRI, FCoE, and OTN. Security can be done quickly with 1.2Tb/s of line rate encryption throughput delivered by bulk Crypto AES-GCM-256/128, MACsec, IPsec, which Xilinx claims this is the “World’s only hardened 400G Crypto Engine on an adaptable platform.”

If PCIe is your jam, Versal HBM packs 1.5Tb/s of aggregated PCIe link bandwidth via PCIe Gen5 with DMA, CCIX, and CXL (yep, playing for either team now). The PCIe interface has dedicated connectivity over the programmable network-on-chip (NoC) to memory.

So, Versal HBM can obviously do a super job getting data onto and off the chip and parking it in memory while it’s there. But, what about the ability to do actual work?

The new device has a triple-header of capabilities to execute and accelerate a wide variety of workloads. Xilinx now refers to these as “engines, and Versal HBM (like their other ACAP devices) includes “Scalar,” “Adaptable,” and “DSP” engines. In more conventional terms, the “Scalar” engines are Arm-based processing systems consisting of dual-core Arm Cortex-A72 application processors and dual-core Arm Cortex-R5F Real-Time processors. The “Adaptable” engines are primarily what we’d think of as FPGA LUT fabric (3.8 or 5.6M logic cells worth), and the “DSP” engines consist of 7.4K or 10.9K DSP slices. Taken together, that’s an impressive amount of compute resources to tackle the tough problems in networking, data center, test and measurement, and aerospace and defense – the target markets for Versal HBM.

Xilinx provided a couple of benchmarks. In the healthcare arena, on the Real-Time Recommendation Engine – Cosine similarity algorithm – Clinical outcome predictions, in which they claim Versal HBM can handle 2x the patient database size of the previous-generation Virtex UltraScale+ and 4x the size of a 3rd gen Intel x867 Xeon gold/platinum scalable processor. Speed-wise, they claim 100x the speed of the Virtex and 200x that of the x86. 

The second benchmark is in real-time fraud detection – Louvain modularity algorithm – to detect anomalies in behavior/transactions. (You know, when the credit card company calls and asks if you just bought a Ferrari on Easter Island.) In this example, they claim the same 2x and 4x capacity advantage (number of vertices), and a more modest 10x and 20x speed advantage over Virtex and x86 respectively.

If piles of chips are more your benchmark style, Xilinx says Versal HBM packs the equivalent of 14 Virtex UltraScale devices, with the equivalent of 32 DDR5 chips-worth of HBM. 

Versal HBM will come in 2 basic sizes, but with 3 different helpings of HBM – 8, 16, or 32GB. You can get started on your design now with the Versal Premium series (which is basically the same as the Versal HBM, but without the HBM). Documentation is available now, tools the second half of 2021, and devices begin sampling the second half of 2022.

Leave a Reply

featured blogs
Nov 30, 2022
By Joe Davis Sponsored by France's ElectroniqueS magazine, the Electrons d'Or Award program identifies the most innovative products of the… ...
Nov 29, 2022
Smart manufacturing '“ the use of nascent technology within the industrial Internet of things (IIoT) to address traditional manufacturing challenges '“ is leading a supply chain revolution, resulting in smart, connected, and intelligent environments, capable of self-operati...
Nov 22, 2022
Learn how analog and mixed-signal (AMS) verification technology, which we developed as part of DARPA's POSH and ERI programs, emulates analog designs. The post What's Driving the World's First Analog and Mixed-Signal Emulation Technology? appeared first on From Silicon To So...
Nov 18, 2022
This bodacious beauty is better equipped than my car, with 360-degree collision avoidance sensors, party lights, and a backup camera, to name but a few....

featured video

Unique AMS Emulation Technology

Sponsored by Synopsys

Learn about Synopsys' collaboration with DARPA and other partners to develop a one-of-a-kind, high-performance AMS silicon verification capability. Please watch the video interview or read it online.

Read the interview online:

featured paper

How SHP in plastic packaging addresses 3 key space application design challenges

Sponsored by Texas Instruments

TI’s SHP space-qualification level provides higher thermal efficiency, a smaller footprint and increased bandwidth compared to traditional ceramic packaging. The common package and pinout between the industrial- and space-grade versions enable you to get the newest technologies into your space hardware designs as soon as the commercial-grade device is sampling, because all prototyping work on the commercial product translates directly to a drop-in space-qualified SHP product.

Click to read more

featured chalk talk

EiceDRIVER™ F3 Enhanced: Isolated Gate Driver with DESAT

Sponsored by Mouser Electronics and Infineon

When it comes to higher power applications, galvanically isolated gate drivers can be great solution for power modules and silicon carbide MOSFETS. In this episode of Chalk Talk, Amelia Dalton and Emanuel Eni from Infineon examine Infineon’s EiceDRIVER™ F3 Enhanced isolated gate driver family. They take a closer look at advantages of galvanic isolation and the key features and benefits that this gate driver family can bring to your next design.

Click here for more information about Infineon Technologies Eval-1ED3321MC12N Evaluation Board