feature article
Subscribe Now

Xilinx Boosts the Bandwidth

New Devices and Boards Raise the Ante

A few years ago, we all thought we were on the cusp of “enough” bandwidth. If everybody with a smartphone could stream HD video simultaneously, what else could we even want? Surely an infrastructure that could handle that task would be ready for whatever else we wanted to throw at it.

Oh, how wrong we were. 

Video still is the biggest bandwidth hog, but now we’re talking 4K and 8K at terrifying bit-rates. And, instead of just one smartphone per person, we’ve got billions of IoT devices all trying to stream massive amounts of data from each of those “things” to all the rest, on that internet of theirs. And, since a lot of the computing for all those edge devices can’t yet be done at the edge, lots of that data is shipped back and forth to cloud data centers, creating even more of a network crunch. Combine that IoT explosion with the transition to 5G and the network that backs it up, and you’ve got yourself a serious global bandwidth crisis.

Which, of course, is exactly what Xilinx was hoping for. 

The history of FPGAs is the history of bandwidth explosion. For the past several decades, FPGAs have been the go-to technology for networking companies trying to hit the next big bandwidth target first. FPGAs powered some of the first one-gig appliances, and now they’re poised to enable the move to 800 gig (gulp). Xilinx just made a pair of announcements that put them front-and-center in that equation. The appetizer is a new Alveo U-25 SmartNIC Platform, and the main course is a crazy-powerful new family of FPGAs ACAPs (shhh, they’re really still FPGAs, but don’t tell). These two announcements push Xilinx’s product portfolio further into the realm of extreme bandwidth enablement.

To fully appreciate the strategic importance of the Alveo card, we need to first consider Intel’s stranglehold on the data center. While Intel’s corporate lawyers probably forbid their marketing from using the term “dominance” (out of fear of garnering the attention of anti-trust regulators), Intel does, in fact, dominate the business of data center hardware. When it comes to technology like FPGAs, Intel has their own via their acquisition of Altera, and they likely aren’t too keen to welcome competitors to the party. Intel can do all kinds of mean tricks, like shipping servers to OEMs with FPGAs already built in. Hey, why buy an FPGA from someone else if your server already came with one, right?

If you’re a competitor like Xilinx, who claims to now be “data center first” – that makes things kinda tough. What you need is a trojan horse. A nice gift you can roll up to the gate of the data center, saying “Hey, let me in. I’ll make things better in there.” And, what better offering to bring than a smart network interface card. In cloud data centers, an estimated 30% of compute resources are sucked up by networking I/O processing. As the number of CPU cores continues to increase, the overhead continues to grow – which results in further increased demands on networking. Smart network interface cards allow you to migrate acceleration closer to the network interface, freeing up compute resources to do more computing.

Xilinx’s new Alveo U-25 SmartNIC is a “bump-in-the-wire.” It can be inserted into the data flow without bothering anything over in Intel-land, and it just makes life easier for all those hard-working and expensive Xeons. Ah, but here’s the rub. Xilinx says the SmartNIC brings a “True convergence of network, storage, and compute acceleration functions on a single platform.” Uh, oh. That means that, once you’ve got a bunch of these suckers in your data center, why not give them some more work to do like storage and compute acceleration. Your Xeons will love you! But Intel – maybe not so much. Xilinx will have succeeded in getting their silicon across the formidable Intel moat, and it’ll be sitting there taunting those Intel components at close range.

Before we dive into more details of the Alveo U-25 SmartNIC, let’s look at the second announcement – the Versal Premium Series. Sounds a little like a new luxury crossover vehicle, doesn’t it? Xilinx says Versal Premium is actually a new family of what they call ACAPs (Adaptable Compute Acceleration Platforms) — or what we call FPGAs. This brings up an issue for those accustomed to the historical programmable logic vernacular. Namely, Xilinx is trying to change the name of EVERYTHING lately. 

Here’s our helpful EE Journal decoder ring, which may come in handy when interpreting newer-generation Xilinx marketing materials:


Platform = FPGA

Series = FPGA Family

Adaptable Hardware = FPGA LUT fabric

Adaptable Engines = FPGA LUTs

Intelligent Engines = DSP Slices

Scalar Engines = ARM processor subsystems

Dynamic Function Exchange = Partial Reconfiguration

LUTs (OK, this is a tricky one) = 6-input LUTs

Buckle up, this one gets complicated. Most of the industry gives “LUT” numbers based on an estimated number of equivalent 4-input LUTs (LUT4s), even though their devices have wider LUTs. To their credit, Xilinx now gives LUT counts based on their 6-input LUTs, which (gasp) correspond to something actually on the chip. Xilinx’s basic programmable logic block is called a CLB. CLBs are made up of slices. Slices are made up of several 6-input LUTs, each of which has 2 flip flops. We can look at the data sheet numbers for “CLB flip flops,” and we see that it is exactly double the number given for LUTs.

Way to go Xilinx! An FPGA datasheet that shows what’s actually on the chip. Novel concept.

Oops, except now they have introduced a marketing metric called System Logic Cells, and it gets top-line billing on the product table. What is a System Logic Cell? Apparently, it is approximately 2.187 LUTs. You may ask how they came up with 2.187? Simple. They calculated what multiplier was required to make their devices sound bigger than Intel/Altera’s. Ah, and they were doing so well.

Anyway, back to Versal Premium ACAPs. These are truly amazing devices. The Versal line has two branches: an “AI” branch with families that include AI engines, and a more general-purpose branch without AI engines. Versal Premium is part of the general-purpose branch, and it comes packed with resources the likes of which we’ve never seen. Beginning with the bandwidth side, Premium packs 9Tb/s of serial bandwidth via up to 68 32Gbps NRZ transceivers, which are perfect for implementing power-optimized mainstream 100G interfaces, and up to 140 58Gbps PAM4 transceivers for applications migrating to 400G. Those 140 transceivers can also be doubled up to give 70 lanes of 112Gbps for those of you flirting with the idea of 800 gig gear.

On the programmable logic “adaptable hardware” front, Versal Premium brings from 700K to almost 3.4M of Xilinx’s LUT6 cells. Even with conversion factors for estimating the equivalent number of LUT4s, that makes these some enormous programmable logic chips. And, all that logic fabric is just part of the story. These devices have incredible amounts of functionality already hardened that would have taken up massive amounts of LUT space in the past. Xilinx estimates that the hardened cores included in one Versal Premium device would fill up the LUT fabric of 22 Virtex UltraScale+ devices. That means these devices bring an astronomical jump in functionality, not to mention the big boost in performance and reduction in power consumption.

Versal Premium has hardened numerous functions that would have previously been implemented or partially implemented in LUTs, including 400G high-speed crypto engines, 600G Interlaken cores, 600G Ethernet cores, multi-rate Ethernet cores, 112G PAM4 Transceivers, PCIe® Gen5 w/DMA & CCIX, and CXL. The CXL in particular is interesting in that it gets Xilinx at least on par with Intel when it comes to cache-coherent communication with the ubiquitous Xeons (whenever Xeon itself eventually gets CXL support). The largest Premium device includes 14,352 of the new (larger) DSP blocks, so for signal processing, convolution, and other tasks that require massive parallel arithmetic, Premium brings the goods.

The largest device has a total of 994Mb of on-chip SRAM, and all that storage close to the computation elements is likely to accelerate applications considerably, as well as reducing power due to less off-chip memory access. And, the embedded ARM processing subsystem includes both application processors (dual-core Arm Cortex-A72, 48KB/32KB L1 Cache w/parity & ECC; 1MB L2 Cache w/ECC), and real-time processors (dual-core Arm Cortex-R5F, 32KB/32KB L1 Cache, and 256KB TCM w/ECC). 

In conventional FPGAs, all that hardware would make routing and timing closure a near-impossibility which is why the Versal devices include a dedicated, full-featured network-on-chip (NoC) that dramatically reduces the burden on routing and timing closure. This should substantially reduce the work required in the Vivado tool suite. And, speaking of tools, Versal Premium will take advantage of Xilinx’s new “Vitis” multi-entry-point tool architecture, which fronts the Vivado back-end traditionally used by HDL designers with environments designed specifically for software engineers, and for AI engineers. 

Versal Premium has documentation available now, with design tool support scheduled for the second half of 2020 and first silicon shipments in early 2021. Pin migration allows prototyping of designs to begin now with the existing Versal Prime devices, with a smooth transition to Versal Premium chips when they become available.

Leave a Reply

featured blogs
Jan 22, 2021
Amidst an ongoing worldwide pandemic, Samtec continues to connect with our communities. As a digital technology company, we understand the challenges and how uncertain times have been for everyone. In early 2020, Samtec Cares suspended its normal grant cycle and concentrated ...
Jan 22, 2021
I was recently introduced to the concept of a tray that quickly and easily attaches to your car'€™s steering wheel (not while you are driving, of course). What a good idea!...
Jan 22, 2021
This is my second post about this year's CES. The first was Consumer Electronics Show 2021: GM, Intel . AMD The second day of CES opened with Lisa Su, AMD's CEO, presenting. AMD announced new... [[ Click on the title to access the full blog on the Cadence Community...
Jan 20, 2021
Explore how EDA tools & proven IP accelerate the automotive design process and ensure compliance with Automotive Safety Integrity Levels & ISO requirements. The post How EDA Tools and IP Support Automotive Functional Safety Compliance appeared first on From Silicon...

featured paper

Overcoming Signal Integrity Challenges of 112G Connections on PCB

Sponsored by Cadence Design Systems

One big challenge with 112G SerDes is handling signal integrity (SI) issues. By the time the signal winds its way from the transmitter on one chip to packages, across traces on PCBs, through connectors or cables, and arrives at the receiver, the signal is very distorted, making it a challenge to recover the clock and data-bits of the information being transferred. Learn how to handle SI issues and ensure that data is faithfully transmitted with a very low bit error rate (BER).

Click here to download the whitepaper

featured chalk talk

Automotive Infotainment

Sponsored by Mouser Electronics and KEMET

In today’s fast-moving automotive electronics design environment, passive components are often one of the last things engineers consider. But, choosing the right passives is now more important than ever, and there is an exciting and sometimes bewildering range of options to choose from. In this episode of Chalk Talk, Amelia Dalton chats with Peter Blais from KEMET about choosing the right passives and the right power distribution for your next automotive design.

Click here for more information about KEMET Electronics Low Voltage DC Auto Infotainment Solutions