feature article
Subscribe Now

Xilinx Boosts the Bandwidth

New Devices and Boards Raise the Ante

A few years ago, we all thought we were on the cusp of “enough” bandwidth. If everybody with a smartphone could stream HD video simultaneously, what else could we even want? Surely an infrastructure that could handle that task would be ready for whatever else we wanted to throw at it.

Oh, how wrong we were. 

Video still is the biggest bandwidth hog, but now we’re talking 4K and 8K at terrifying bit-rates. And, instead of just one smartphone per person, we’ve got billions of IoT devices all trying to stream massive amounts of data from each of those “things” to all the rest, on that internet of theirs. And, since a lot of the computing for all those edge devices can’t yet be done at the edge, lots of that data is shipped back and forth to cloud data centers, creating even more of a network crunch. Combine that IoT explosion with the transition to 5G and the network that backs it up, and you’ve got yourself a serious global bandwidth crisis.

Which, of course, is exactly what Xilinx was hoping for. 

The history of FPGAs is the history of bandwidth explosion. For the past several decades, FPGAs have been the go-to technology for networking companies trying to hit the next big bandwidth target first. FPGAs powered some of the first one-gig appliances, and now they’re poised to enable the move to 800 gig (gulp). Xilinx just made a pair of announcements that put them front-and-center in that equation. The appetizer is a new Alveo U-25 SmartNIC Platform, and the main course is a crazy-powerful new family of FPGAs ACAPs (shhh, they’re really still FPGAs, but don’t tell). These two announcements push Xilinx’s product portfolio further into the realm of extreme bandwidth enablement.

To fully appreciate the strategic importance of the Alveo card, we need to first consider Intel’s stranglehold on the data center. While Intel’s corporate lawyers probably forbid their marketing from using the term “dominance” (out of fear of garnering the attention of anti-trust regulators), Intel does, in fact, dominate the business of data center hardware. When it comes to technology like FPGAs, Intel has their own via their acquisition of Altera, and they likely aren’t too keen to welcome competitors to the party. Intel can do all kinds of mean tricks, like shipping servers to OEMs with FPGAs already built in. Hey, why buy an FPGA from someone else if your server already came with one, right?

If you’re a competitor like Xilinx, who claims to now be “data center first” – that makes things kinda tough. What you need is a trojan horse. A nice gift you can roll up to the gate of the data center, saying “Hey, let me in. I’ll make things better in there.” And, what better offering to bring than a smart network interface card. In cloud data centers, an estimated 30% of compute resources are sucked up by networking I/O processing. As the number of CPU cores continues to increase, the overhead continues to grow – which results in further increased demands on networking. Smart network interface cards allow you to migrate acceleration closer to the network interface, freeing up compute resources to do more computing.

Xilinx’s new Alveo U-25 SmartNIC is a “bump-in-the-wire.” It can be inserted into the data flow without bothering anything over in Intel-land, and it just makes life easier for all those hard-working and expensive Xeons. Ah, but here’s the rub. Xilinx says the SmartNIC brings a “True convergence of network, storage, and compute acceleration functions on a single platform.” Uh, oh. That means that, once you’ve got a bunch of these suckers in your data center, why not give them some more work to do like storage and compute acceleration. Your Xeons will love you! But Intel – maybe not so much. Xilinx will have succeeded in getting their silicon across the formidable Intel moat, and it’ll be sitting there taunting those Intel components at close range.

Before we dive into more details of the Alveo U-25 SmartNIC, let’s look at the second announcement – the Versal Premium Series. Sounds a little like a new luxury crossover vehicle, doesn’t it? Xilinx says Versal Premium is actually a new family of what they call ACAPs (Adaptable Compute Acceleration Platforms) — or what we call FPGAs. This brings up an issue for those accustomed to the historical programmable logic vernacular. Namely, Xilinx is trying to change the name of EVERYTHING lately. 

Here’s our helpful EE Journal decoder ring, which may come in handy when interpreting newer-generation Xilinx marketing materials:


Platform = FPGA

Series = FPGA Family

Adaptable Hardware = FPGA LUT fabric

Adaptable Engines = FPGA LUTs

Intelligent Engines = DSP Slices

Scalar Engines = ARM processor subsystems

Dynamic Function Exchange = Partial Reconfiguration

LUTs (OK, this is a tricky one) = 6-input LUTs

Buckle up, this one gets complicated. Most of the industry gives “LUT” numbers based on an estimated number of equivalent 4-input LUTs (LUT4s), even though their devices have wider LUTs. To their credit, Xilinx now gives LUT counts based on their 6-input LUTs, which (gasp) correspond to something actually on the chip. Xilinx’s basic programmable logic block is called a CLB. CLBs are made up of slices. Slices are made up of several 6-input LUTs, each of which has 2 flip flops. We can look at the data sheet numbers for “CLB flip flops,” and we see that it is exactly double the number given for LUTs.

Way to go Xilinx! An FPGA datasheet that shows what’s actually on the chip. Novel concept.

Oops, except now they have introduced a marketing metric called System Logic Cells, and it gets top-line billing on the product table. What is a System Logic Cell? Apparently, it is approximately 2.187 LUTs. You may ask how they came up with 2.187? Simple. They calculated what multiplier was required to make their devices sound bigger than Intel/Altera’s. Ah, and they were doing so well.

Anyway, back to Versal Premium ACAPs. These are truly amazing devices. The Versal line has two branches: an “AI” branch with families that include AI engines, and a more general-purpose branch without AI engines. Versal Premium is part of the general-purpose branch, and it comes packed with resources the likes of which we’ve never seen. Beginning with the bandwidth side, Premium packs 9Tb/s of serial bandwidth via up to 68 32Gbps NRZ transceivers, which are perfect for implementing power-optimized mainstream 100G interfaces, and up to 140 58Gbps PAM4 transceivers for applications migrating to 400G. Those 140 transceivers can also be doubled up to give 70 lanes of 112Gbps for those of you flirting with the idea of 800 gig gear.

On the programmable logic “adaptable hardware” front, Versal Premium brings from 700K to almost 3.4M of Xilinx’s LUT6 cells. Even with conversion factors for estimating the equivalent number of LUT4s, that makes these some enormous programmable logic chips. And, all that logic fabric is just part of the story. These devices have incredible amounts of functionality already hardened that would have taken up massive amounts of LUT space in the past. Xilinx estimates that the hardened cores included in one Versal Premium device would fill up the LUT fabric of 22 Virtex UltraScale+ devices. That means these devices bring an astronomical jump in functionality, not to mention the big boost in performance and reduction in power consumption.

Versal Premium has hardened numerous functions that would have previously been implemented or partially implemented in LUTs, including 400G high-speed crypto engines, 600G Interlaken cores, 600G Ethernet cores, multi-rate Ethernet cores, 112G PAM4 Transceivers, PCIe® Gen5 w/DMA & CCIX, and CXL. The CXL in particular is interesting in that it gets Xilinx at least on par with Intel when it comes to cache-coherent communication with the ubiquitous Xeons (whenever Xeon itself eventually gets CXL support). The largest Premium device includes 14,352 of the new (larger) DSP blocks, so for signal processing, convolution, and other tasks that require massive parallel arithmetic, Premium brings the goods.

The largest device has a total of 994Mb of on-chip SRAM, and all that storage close to the computation elements is likely to accelerate applications considerably, as well as reducing power due to less off-chip memory access. And, the embedded ARM processing subsystem includes both application processors (dual-core Arm Cortex-A72, 48KB/32KB L1 Cache w/parity & ECC; 1MB L2 Cache w/ECC), and real-time processors (dual-core Arm Cortex-R5F, 32KB/32KB L1 Cache, and 256KB TCM w/ECC). 

In conventional FPGAs, all that hardware would make routing and timing closure a near-impossibility which is why the Versal devices include a dedicated, full-featured network-on-chip (NoC) that dramatically reduces the burden on routing and timing closure. This should substantially reduce the work required in the Vivado tool suite. And, speaking of tools, Versal Premium will take advantage of Xilinx’s new “Vitis” multi-entry-point tool architecture, which fronts the Vivado back-end traditionally used by HDL designers with environments designed specifically for software engineers, and for AI engineers. 

Versal Premium has documentation available now, with design tool support scheduled for the second half of 2020 and first silicon shipments in early 2021. Pin migration allows prototyping of designs to begin now with the existing Versal Prime devices, with a smooth transition to Versal Premium chips when they become available.

Leave a Reply

featured blogs
Apr 11, 2021
https://youtu.be/D29rGqkkf80 Made in "Hawaii" (camera Ziyue Zhang) Monday: Dynamic Duo 2: The Sequel Tuesday: Gall's Law and Big Ball of Mud Wednesday: Benedict Evans on Tech in 2021... [[ Click on the title to access the full blog on the Cadence Community sit...
Apr 8, 2021
We all know the widespread havoc that Covid-19 wreaked in 2020. While the electronics industry in general, and connectors in particular, took an initial hit, the industry rebounded in the second half of 2020 and is rolling into 2021. Travel came to an almost stand-still in 20...
Apr 7, 2021
We explore how EDA tools enable hyper-convergent IC designs, supporting the PPA and yield targets required by advanced 3DICs and SoCs used in AI and HPC. The post Why Hyper-Convergent Chip Designs Call for a New Approach to Circuit Simulation appeared first on From Silicon T...
Apr 5, 2021
Back in November 2019, just a few short months before we all began an enforced… The post Collaboration and innovation thrive on diversity appeared first on Design with Calibre....

featured video

Learn the basics of Hall Effect sensors

Sponsored by Texas Instruments

This video introduces Hall Effect, permanent magnets and various magnetic properties. It'll walk through the benefits of Hall Effect sensors, how Hall ICs compare to discrete Hall elements and the different types of Hall Effect sensors.

Click here for more information

featured paper

Understanding the Foundations of Quiescent Current in Linear Power Systems

Sponsored by Texas Instruments

Minimizing power consumption is an important design consideration, especially in battery-powered systems that utilize linear regulators or low-dropout regulators (LDOs). Read this new whitepaper to learn the fundamentals of IQ in linear-power systems, how to predict behavior in dropout conditions, and maintain minimal disturbance during the load transient response.

Click here to download the whitepaper

featured chalk talk

Maxim's Ultra-High CMTI Isolated Gate Drivers

Sponsored by Mouser Electronics and Maxim Integrated

Recent advances in wide-bandgap materials such as silicon carbide and gallium nitride are transforming gate driver technology, bringing higher power efficiency and a host of other follow-on benefits. In this episode of Chalk Talk, Amelia Dalton chats with Suravi Karmacharya of Maxim Integrated about Maxim’s MAX22700-MAX22702 family of single-channel isolated gate drivers.

Click here for more information about Maxim Integrated MAX22700–MAX22702 Isolated Gate Drivers