feature article
Subscribe Now

Xilinx Launches Versal HBM

Busting the Memory Bottleneck

It’s no secret that we are drowning in data. Today’s applications and algorithms require almost incomprehensible amounts of data, and that means the bandwidth requirements are exploding faster than networking and memory technologies can handle. Even with the most advanced accelerators we can build with our FPGAs, we can be choked trying to get data on and off the chip and finding places to store information as we are processing. 

Even though memory bandwidth has been increasing rapidly, the demand is growing faster. Pushing around zettabytes of information worldwide has stressed current technologies to the breaking point. Pushing performance-critical tasks off to FPGAs doesn’t help if the system is starved for memory bandwidth.

At the same time, more and more of that data needs to be secured, and every time data is moved across an interface, it becomes vulnerable.

What we need is to move the memory closer to the processing.

Xilinx has taken a big step toward memory localization with their new Versal HBM series of “ACAP” devices (we think of them as FPGAs). HBM (or high-bandwidth memory) is designed to sit in the same package with other processing elements, communicating via stacked-silicon interconnect (SSI) advanced packaging technology. By keeping the memory in-package, much higher-bandwidth connections are possible, and avoiding off-chip memory interfaces significantly reduces power consumption and interface latency.

This is far from Xilinx’s first rodeo with SSI. The company was a pioneer in silicon interposers with FPGA years ago, and this new device is built on fourth-generation SSI. Early on, SSI was used primarily to increase effective yield by packing several smaller FPGA chiplets into a single package to build a larger FPGA. But today, SSI is also used to make Xilinx’s silicon more scalable and versatile. To build Versal HBM, for example, they just swapped out one “Super Logic Region” (SLR) chiplet for an HBM2e stack from their Versal Premium device to build Versal HBM. (OK, it’s a little bit more complicated than that, but you get the idea.)

Compared with external DDR5, in-package HBM offers 8x the bandwidth at 63% lower power. And that’s a big deal. Parking an HBM stack inside your FPGA gives you a memory bandwidth bonanza, while saving your power budget for processing. 

This is not the first time Xilinx has popped HBM into one of their devices.  One version of their previous-generation Virtex Ultrascale+ FPGAs featured in-package HBM. The new Versal HBM outperforms that one in every axis, however, with 1.8x the memory bandwidth (from 460Gbps to 820Gbps) at 15% lower power and 2x the HBM memory capacity (32GB vs 16GB).

Versal HBM has a lot more than just more memory bandwidth, though. They’ve also significantly increased the size of the SerDes pipes for getting data on and off the device, doubling the total bandwidth to a mind-bending 5.6Tb/s. The SerDes is scalable for maximum application flexibility, with 32Gbps NRZ for power-optimized 100G interfaces, 58Gbps PAM4 for the current 400G ramp and deployment, and super-sporty 112Gbps PAM4 for future 800 gig network development on 100G per lane optics. 

Many standard interfaces are pre-built and hardened for you, including 2.4Tb/s of scalable Ethernet bandwidth that offers multi-rate: 400/200/100/50/40/25/10G with FEC, and multi-standard: FlexE, Flex-O, eCPRI, FCoE, and OTN. Security can be done quickly with 1.2Tb/s of line rate encryption throughput delivered by bulk Crypto AES-GCM-256/128, MACsec, IPsec, which Xilinx claims this is the “World’s only hardened 400G Crypto Engine on an adaptable platform.”

If PCIe is your jam, Versal HBM packs 1.5Tb/s of aggregated PCIe link bandwidth via PCIe Gen5 with DMA, CCIX, and CXL (yep, playing for either team now). The PCIe interface has dedicated connectivity over the programmable network-on-chip (NoC) to memory.

So, Versal HBM can obviously do a super job getting data onto and off the chip and parking it in memory while it’s there. But, what about the ability to do actual work?

The new device has a triple-header of capabilities to execute and accelerate a wide variety of workloads. Xilinx now refers to these as “engines, and Versal HBM (like their other ACAP devices) includes “Scalar,” “Adaptable,” and “DSP” engines. In more conventional terms, the “Scalar” engines are Arm-based processing systems consisting of dual-core Arm Cortex-A72 application processors and dual-core Arm Cortex-R5F Real-Time processors. The “Adaptable” engines are primarily what we’d think of as FPGA LUT fabric (3.8 or 5.6M logic cells worth), and the “DSP” engines consist of 7.4K or 10.9K DSP slices. Taken together, that’s an impressive amount of compute resources to tackle the tough problems in networking, data center, test and measurement, and aerospace and defense – the target markets for Versal HBM.

Xilinx provided a couple of benchmarks. In the healthcare arena, on the Real-Time Recommendation Engine – Cosine similarity algorithm – Clinical outcome predictions, in which they claim Versal HBM can handle 2x the patient database size of the previous-generation Virtex UltraScale+ and 4x the size of a 3rd gen Intel x867 Xeon gold/platinum scalable processor. Speed-wise, they claim 100x the speed of the Virtex and 200x that of the x86. 

The second benchmark is in real-time fraud detection – Louvain modularity algorithm – to detect anomalies in behavior/transactions. (You know, when the credit card company calls and asks if you just bought a Ferrari on Easter Island.) In this example, they claim the same 2x and 4x capacity advantage (number of vertices), and a more modest 10x and 20x speed advantage over Virtex and x86 respectively.

If piles of chips are more your benchmark style, Xilinx says Versal HBM packs the equivalent of 14 Virtex UltraScale devices, with the equivalent of 32 DDR5 chips-worth of HBM. 

Versal HBM will come in 2 basic sizes, but with 3 different helpings of HBM – 8, 16, or 32GB. You can get started on your design now with the Versal Premium series (which is basically the same as the Versal HBM, but without the HBM). Documentation is available now, tools the second half of 2021, and devices begin sampling the second half of 2022.

Leave a Reply

featured blogs
Jul 1, 2022
We all look for 100% perfection and want to turn our dreams (expectations) into reality as far as we can. Are you also looking for a magic wand to turn expectation into reality? The story applies to... ...
Jun 30, 2022
Learn how AI-powered cameras and neural network image processing enable everything from smartphone portraits to machine vision and automotive safety features. The post How AI Helps Cameras See More Clearly appeared first on From Silicon To Software....
Jun 28, 2022
Watching this video caused me to wander off into the weeds looking at a weird and wonderful collection of wheeled implementations....

featured video

Synopsys PCIe 6.0 IP TX and RX Successful Interoperability with Keysight

Sponsored by Synopsys

This DesignCon 2022 video features Synopsys PHY IP for PCIe 6.0 showing wide open PAM-4 eyes, good jitter breakdown decomposition on the Keysight oscilloscope, excellent receiver performance, and simulation-to-silicon correlation.

Click here for more information

featured paper

Addressing high-voltage design challenges with reliable and affordable isolation tech

Sponsored by Texas Instruments

Check out TI’s new white paper for an overview of galvanic isolation techniques, as well as how to improve isolated designs in electric vehicles, grid infrastructure, factory automation and motor drives.

Click to read more

featured chalk talk

Reduce Power System Needs with Multichannel Power Monitors

Sponsored by Mouser Electronics and Microchip

Power monitors can be very effective in terms of power management for a variety of designs and the use of a multichannel power monitors can not only lower your overall system power but also lower your code overhead, simplify prototyping and event detection. In this episode of Chalk Talk, Amelia Dalton chats with Mitch Polonsky from Microchip about the benefits of multichannel power monitors and how Microchip’s PAC194x and PAC195x can help you monitor your power in your next design.

Click here for more information about Microchip Technology PAC194x & PAC195x Monitors