feature article
Subscribe Now

Xilinx Launches Versal HBM

Busting the Memory Bottleneck

It’s no secret that we are drowning in data. Today’s applications and algorithms require almost incomprehensible amounts of data, and that means the bandwidth requirements are exploding faster than networking and memory technologies can handle. Even with the most advanced accelerators we can build with our FPGAs, we can be choked trying to get data on and off the chip and finding places to store information as we are processing. 

Even though memory bandwidth has been increasing rapidly, the demand is growing faster. Pushing around zettabytes of information worldwide has stressed current technologies to the breaking point. Pushing performance-critical tasks off to FPGAs doesn’t help if the system is starved for memory bandwidth.

At the same time, more and more of that data needs to be secured, and every time data is moved across an interface, it becomes vulnerable.

What we need is to move the memory closer to the processing.

Xilinx has taken a big step toward memory localization with their new Versal HBM series of “ACAP” devices (we think of them as FPGAs). HBM (or high-bandwidth memory) is designed to sit in the same package with other processing elements, communicating via stacked-silicon interconnect (SSI) advanced packaging technology. By keeping the memory in-package, much higher-bandwidth connections are possible, and avoiding off-chip memory interfaces significantly reduces power consumption and interface latency.

This is far from Xilinx’s first rodeo with SSI. The company was a pioneer in silicon interposers with FPGA years ago, and this new device is built on fourth-generation SSI. Early on, SSI was used primarily to increase effective yield by packing several smaller FPGA chiplets into a single package to build a larger FPGA. But today, SSI is also used to make Xilinx’s silicon more scalable and versatile. To build Versal HBM, for example, they just swapped out one “Super Logic Region” (SLR) chiplet for an HBM2e stack from their Versal Premium device to build Versal HBM. (OK, it’s a little bit more complicated than that, but you get the idea.)

Compared with external DDR5, in-package HBM offers 8x the bandwidth at 63% lower power. And that’s a big deal. Parking an HBM stack inside your FPGA gives you a memory bandwidth bonanza, while saving your power budget for processing. 

This is not the first time Xilinx has popped HBM into one of their devices.  One version of their previous-generation Virtex Ultrascale+ FPGAs featured in-package HBM. The new Versal HBM outperforms that one in every axis, however, with 1.8x the memory bandwidth (from 460Gbps to 820Gbps) at 15% lower power and 2x the HBM memory capacity (32GB vs 16GB).

Versal HBM has a lot more than just more memory bandwidth, though. They’ve also significantly increased the size of the SerDes pipes for getting data on and off the device, doubling the total bandwidth to a mind-bending 5.6Tb/s. The SerDes is scalable for maximum application flexibility, with 32Gbps NRZ for power-optimized 100G interfaces, 58Gbps PAM4 for the current 400G ramp and deployment, and super-sporty 112Gbps PAM4 for future 800 gig network development on 100G per lane optics. 

Many standard interfaces are pre-built and hardened for you, including 2.4Tb/s of scalable Ethernet bandwidth that offers multi-rate: 400/200/100/50/40/25/10G with FEC, and multi-standard: FlexE, Flex-O, eCPRI, FCoE, and OTN. Security can be done quickly with 1.2Tb/s of line rate encryption throughput delivered by bulk Crypto AES-GCM-256/128, MACsec, IPsec, which Xilinx claims this is the “World’s only hardened 400G Crypto Engine on an adaptable platform.”

If PCIe is your jam, Versal HBM packs 1.5Tb/s of aggregated PCIe link bandwidth via PCIe Gen5 with DMA, CCIX, and CXL (yep, playing for either team now). The PCIe interface has dedicated connectivity over the programmable network-on-chip (NoC) to memory.

So, Versal HBM can obviously do a super job getting data onto and off the chip and parking it in memory while it’s there. But, what about the ability to do actual work?

The new device has a triple-header of capabilities to execute and accelerate a wide variety of workloads. Xilinx now refers to these as “engines, and Versal HBM (like their other ACAP devices) includes “Scalar,” “Adaptable,” and “DSP” engines. In more conventional terms, the “Scalar” engines are Arm-based processing systems consisting of dual-core Arm Cortex-A72 application processors and dual-core Arm Cortex-R5F Real-Time processors. The “Adaptable” engines are primarily what we’d think of as FPGA LUT fabric (3.8 or 5.6M logic cells worth), and the “DSP” engines consist of 7.4K or 10.9K DSP slices. Taken together, that’s an impressive amount of compute resources to tackle the tough problems in networking, data center, test and measurement, and aerospace and defense – the target markets for Versal HBM.

Xilinx provided a couple of benchmarks. In the healthcare arena, on the Real-Time Recommendation Engine – Cosine similarity algorithm – Clinical outcome predictions, in which they claim Versal HBM can handle 2x the patient database size of the previous-generation Virtex UltraScale+ and 4x the size of a 3rd gen Intel x867 Xeon gold/platinum scalable processor. Speed-wise, they claim 100x the speed of the Virtex and 200x that of the x86. 

The second benchmark is in real-time fraud detection – Louvain modularity algorithm – to detect anomalies in behavior/transactions. (You know, when the credit card company calls and asks if you just bought a Ferrari on Easter Island.) In this example, they claim the same 2x and 4x capacity advantage (number of vertices), and a more modest 10x and 20x speed advantage over Virtex and x86 respectively.

If piles of chips are more your benchmark style, Xilinx says Versal HBM packs the equivalent of 14 Virtex UltraScale devices, with the equivalent of 32 DDR5 chips-worth of HBM. 

Versal HBM will come in 2 basic sizes, but with 3 different helpings of HBM – 8, 16, or 32GB. You can get started on your design now with the Versal Premium series (which is basically the same as the Versal HBM, but without the HBM). Documentation is available now, tools the second half of 2021, and devices begin sampling the second half of 2022.

Leave a Reply

featured blogs
Apr 16, 2024
In today's semiconductor era, every minute, you always look for the opportunity to enhance your skills and learning growth and want to keep up to date with the technology. This could mean you would also like to get hold of the small concepts behind the complex chip desig...
Apr 11, 2024
See how Achronix used our physical verification tools to accelerate the SoC design and verification flow, boosting chip design productivity w/ cloud-based EDA.The post Achronix Achieves 5X Faster Physical Verification for Full SoC Within Budget with Synopsys Cloud appeared ...
Mar 30, 2024
Join me on a brief stream-of-consciousness tour to see what it's like to live inside (what I laughingly call) my mind...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

SLM Silicon.da Introduction
Sponsored by Synopsys
In this episode of Chalk Talk, Amelia Dalton and Guy Cortez from Synopsys investigate how Synopsys’ Silicon.da platform can increase engineering productivity and silicon efficiency while providing the tool scalability needed for today’s semiconductor designs. They also walk through the steps involved in a SLM workflow and examine how this open and extensible platform can help you avoid pitfalls in each step of your next IC design.
Dec 6, 2023
17,490 views