feature article
Subscribe Now

Xilinx Launches Versal HBM

Busting the Memory Bottleneck

It’s no secret that we are drowning in data. Today’s applications and algorithms require almost incomprehensible amounts of data, and that means the bandwidth requirements are exploding faster than networking and memory technologies can handle. Even with the most advanced accelerators we can build with our FPGAs, we can be choked trying to get data on and off the chip and finding places to store information as we are processing. 

Even though memory bandwidth has been increasing rapidly, the demand is growing faster. Pushing around zettabytes of information worldwide has stressed current technologies to the breaking point. Pushing performance-critical tasks off to FPGAs doesn’t help if the system is starved for memory bandwidth.

At the same time, more and more of that data needs to be secured, and every time data is moved across an interface, it becomes vulnerable.

What we need is to move the memory closer to the processing.

Xilinx has taken a big step toward memory localization with their new Versal HBM series of “ACAP” devices (we think of them as FPGAs). HBM (or high-bandwidth memory) is designed to sit in the same package with other processing elements, communicating via stacked-silicon interconnect (SSI) advanced packaging technology. By keeping the memory in-package, much higher-bandwidth connections are possible, and avoiding off-chip memory interfaces significantly reduces power consumption and interface latency.

This is far from Xilinx’s first rodeo with SSI. The company was a pioneer in silicon interposers with FPGA years ago, and this new device is built on fourth-generation SSI. Early on, SSI was used primarily to increase effective yield by packing several smaller FPGA chiplets into a single package to build a larger FPGA. But today, SSI is also used to make Xilinx’s silicon more scalable and versatile. To build Versal HBM, for example, they just swapped out one “Super Logic Region” (SLR) chiplet for an HBM2e stack from their Versal Premium device to build Versal HBM. (OK, it’s a little bit more complicated than that, but you get the idea.)

Compared with external DDR5, in-package HBM offers 8x the bandwidth at 63% lower power. And that’s a big deal. Parking an HBM stack inside your FPGA gives you a memory bandwidth bonanza, while saving your power budget for processing. 

This is not the first time Xilinx has popped HBM into one of their devices.  One version of their previous-generation Virtex Ultrascale+ FPGAs featured in-package HBM. The new Versal HBM outperforms that one in every axis, however, with 1.8x the memory bandwidth (from 460Gbps to 820Gbps) at 15% lower power and 2x the HBM memory capacity (32GB vs 16GB).

Versal HBM has a lot more than just more memory bandwidth, though. They’ve also significantly increased the size of the SerDes pipes for getting data on and off the device, doubling the total bandwidth to a mind-bending 5.6Tb/s. The SerDes is scalable for maximum application flexibility, with 32Gbps NRZ for power-optimized 100G interfaces, 58Gbps PAM4 for the current 400G ramp and deployment, and super-sporty 112Gbps PAM4 for future 800 gig network development on 100G per lane optics. 

Many standard interfaces are pre-built and hardened for you, including 2.4Tb/s of scalable Ethernet bandwidth that offers multi-rate: 400/200/100/50/40/25/10G with FEC, and multi-standard: FlexE, Flex-O, eCPRI, FCoE, and OTN. Security can be done quickly with 1.2Tb/s of line rate encryption throughput delivered by bulk Crypto AES-GCM-256/128, MACsec, IPsec, which Xilinx claims this is the “World’s only hardened 400G Crypto Engine on an adaptable platform.”

If PCIe is your jam, Versal HBM packs 1.5Tb/s of aggregated PCIe link bandwidth via PCIe Gen5 with DMA, CCIX, and CXL (yep, playing for either team now). The PCIe interface has dedicated connectivity over the programmable network-on-chip (NoC) to memory.

So, Versal HBM can obviously do a super job getting data onto and off the chip and parking it in memory while it’s there. But, what about the ability to do actual work?

The new device has a triple-header of capabilities to execute and accelerate a wide variety of workloads. Xilinx now refers to these as “engines, and Versal HBM (like their other ACAP devices) includes “Scalar,” “Adaptable,” and “DSP” engines. In more conventional terms, the “Scalar” engines are Arm-based processing systems consisting of dual-core Arm Cortex-A72 application processors and dual-core Arm Cortex-R5F Real-Time processors. The “Adaptable” engines are primarily what we’d think of as FPGA LUT fabric (3.8 or 5.6M logic cells worth), and the “DSP” engines consist of 7.4K or 10.9K DSP slices. Taken together, that’s an impressive amount of compute resources to tackle the tough problems in networking, data center, test and measurement, and aerospace and defense – the target markets for Versal HBM.

Xilinx provided a couple of benchmarks. In the healthcare arena, on the Real-Time Recommendation Engine – Cosine similarity algorithm – Clinical outcome predictions, in which they claim Versal HBM can handle 2x the patient database size of the previous-generation Virtex UltraScale+ and 4x the size of a 3rd gen Intel x867 Xeon gold/platinum scalable processor. Speed-wise, they claim 100x the speed of the Virtex and 200x that of the x86. 

The second benchmark is in real-time fraud detection – Louvain modularity algorithm – to detect anomalies in behavior/transactions. (You know, when the credit card company calls and asks if you just bought a Ferrari on Easter Island.) In this example, they claim the same 2x and 4x capacity advantage (number of vertices), and a more modest 10x and 20x speed advantage over Virtex and x86 respectively.

If piles of chips are more your benchmark style, Xilinx says Versal HBM packs the equivalent of 14 Virtex UltraScale devices, with the equivalent of 32 DDR5 chips-worth of HBM. 

Versal HBM will come in 2 basic sizes, but with 3 different helpings of HBM – 8, 16, or 32GB. You can get started on your design now with the Versal Premium series (which is basically the same as the Versal HBM, but without the HBM). Documentation is available now, tools the second half of 2021, and devices begin sampling the second half of 2022.

Leave a Reply

featured blogs
Jan 26, 2022
With boards becoming more complex and lightweight at the same time, designing and manufacturing a cost-effective and reliable PCB has assumed greater significance than ever before. Inaccurate or... [[ Click on the title to access the full blog on the Cadence Community site. ...
Jan 26, 2022
PCIe 5.0 designs are currently in massive deployment; learn about the standard and explore PCIe 5.0 applications and the importance of silicon-proven IP. The post The PCI Express 5.0 Superhighway Is Wide, Fast, and Ready for Your Designs appeared first on From Silicon To Sof...
Jan 24, 2022
I just created a handy-dandy one-page Quick-Quick-Start Guide for seniors that covers their most commonly asked questions pertaining to the iPhone SE....

featured video

AI SoC Chats: Understanding Compute Needs for AI SoCs

Sponsored by Synopsys

Will your next system require high performance AI? Learn what the latest systems are using for computation, including AI math, floating point and dot product hardware, and processor IP.

Click here for more information about DesignWare IP for Amazing AI

featured paper

Using the MAX66242 Mobile Application, the Basics

Sponsored by Analog Devices

This application note describes the basics of the near-field communication (NFC)/radio frequency identification (RFID) MAX66242EVKIT board and an application utilizing the NFC capabilities of iOS and Android® based mobile devices to exercise board functionality. It then demonstrates how the application enables the user with the ability to use the memory and secure features of the MAX66242. It also shows how to use the MAX66242 with an onboard I2C temperature sensor which demonstrates the energy harvesting feature of the device.

Click to read more

featured chalk talk

Multi-Protocol Wireless in Embedded Applications

Sponsored by Mouser Electronics and STMicroelectronics

As our devices get smarter, our communication needs get more complex. In this episode of Chalk Talk, Amelia Dalton chats with Marc Hervieu from STMicroelectronics joins me to discuss the various topologies present in today’s wireless connectivity, and how the innovative architecture and flexible use of resources of the STMicroelectronics STM32WB microcontroller can help you with your wireless connectivity concerns in your next embedded design.

Click here for more information about STMicroelectronics Wireless Connectivity Solutions