Accelerating Video for the Masses

It’s not news that streaming video is taking over the world. And, in the process, it has become by far the leading consumer of internet bandwidth. With the COVID-19 crisis, it’s a good bet that around 100% of the people reading this article have used video conferencing services within the last few weeks, and more likely within the last few days or even hours. In fact, you may be reading this article while pretending to pay attention in a Zoom meeting. It’s OK. We won’t tell. Carry on, and just be sure to make at least brief eye contact with the camera frequently. It’s more convincing.

Now, while it may seem that the challenges with streaming something like Netflix, for example, would be the same – or at least very similar, to streaming YouTube, Zoom, or Twitch. It’s all about transcoding, compressing, decompressing – shouldn’t be that different, right?

Wrong-o.

For example, live video needs to be transcoded in real time, whereas platforms like Netflix can preprocess a lot of the content into ready-to-stream formats. There are numerous other differences as well, and it turns out that handling the demands of live video combined with a very large number of streams is a whole ‘nuther problem. And, the audience for gaming platforms like Twitch is actually much larger than even entertainment services like Netflix and HBO.

Handling all those live, real-time streams with minimal bandwidth utilization is therefore very big business. And, any time you’ve got a complex payload with widely varied demands, you can bet that FPGAs have a lot to offer in the solution space. Xilinx estimates that a 30% reduction in bandwidth could result in an annual total cost savings of around $21M for a provider averaging 100K streams. That’s enough to buy quite a few FPGAs.

FPGAs are involved in just about every link of the video delivery chain. Besides transcoding, you’ll find them in the network (of course), in storage, and in a host of other roles. If you’re watching a video, there is pretty much a 100% chance that one or more FPGAs are involved. But moving transcoding from software processors to FPGA accelerators walks service providers directly into the achilles heel of FPGA technology – the expertise required to do FPGA-based system development. The Venn diagram of companies that supply streaming video services and companies with FPGA design expertise in house looks very much like John Lennon’s glasses.

Xilinx is attacking that problem with a pair of new ready-to-roll server solutions with all the FPGA design work baked in. Zero lines of HDL will be harmed in the transition to FPGA-powered bliss. Companies stacking these gizmos into their racks may not even know they’re using FPGAs, or it may be like when consumers buy a car with “turbo” or “four valves per cylinder” – they have at best a vague understanding of the technical merits of the technology they’re taking advantage of, but the logo looks good on the fender.

We, on the other hand, do want to understand what’s under the hood, so here we go. Xilinx has announced the Alveo U50 and U30 real-time computing video appliances based on what the company is calling the new “Xilinx Real-Time Server reference architecture.” Xilinx is aiming these new solutions at service providers in areas like eSports, game streaming, social and video conferencing, live distance learning, telemedicine, and live broadcast video, with the goal of maximizing video quality and bitrate while minimizing cost per channel. Xilinx has taken their existing Alveo data center accelerator cards and done all the heavy lifting on the FPGA part (OK, actually we need to say again here that Xilinx does not call Zynq SoC devices “FPGAs,” but bear with us).

The two appliances are the “the High Channel Density Video Appliance,” which integrates up to eight Alveo U30 data center acceleration cards, and the “Ultra-Low Bitrate Optimized Video Appliance,” which integrates up to eight Alveo U50 cards. The U30 card is based on Zynq UltraScale+, supports both the H.264 and HEVC (H.265) codecs, and is capable of streaming up to sixteen 1080p30 channels per card. The U50 card is based on what the company calls a XCU50 FPGA, which appears to be a variant of the impressive VU35P – a Virtex device packaged with (among other goodies) 8GB HBM2, 100GbE networking, and a PCI Express® 4.0. The U50 raises the performance bar considerably, supporting up to seven full-HD 1080p60 channels along with eight full ABR ladders (all at x265 medium preset).

The appliances are being built by various OEMs in both 1U and 2U form factors, and Xilinx figures a single 1U RT server can deliver 64x ABR @ 1080p30 on 35W per card. The company says that achieving that same throughput would require 4 HPE ProLiant DL380 servers with 322 Nvidia T4 accelerators consuming 58W per card – costing six times as much, and consuming 5x the power. Clearly a big win for the FPGA-based strategy.

Both appliances are built on the FFmpeg framework, which should simplify the use of existing software and GPGPU-based transcoder infrastructure via a common API. Xilinx says the HEVC codec is “rebuilt from the ground up in a componentized manner to allow for better control of the codec down to the frame level,” allowing system integrators to adjust rate control and fine-tune other parameters to optimize video quality and bitrate to suit the particular end application. Along with an ecosystem of partners, Xilinx provides a full software stack for the new boxes, building a firewall to protect customers from having to venture into the land of LUTs and HDLs.

With the COVID crisis dramatically ramping up the demand for video streaming, we believe there will be a huge demand for appliances such as these, and the turnkey approach Xilinx is taking should dramatically reduce the friction for companies adopting the technology. This is also a harbinger of what we might expect to see much more frequently in the near-term programmable logic market as big volume segments transition from selling to teams designing-in FPGA technology to OEMs marketing turnkey solutions with FPGA-based solutions tucked in quietly under the hood. It will be interesting to watch.

Accelerating Video for the Masses

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk