Unless you’ve been hiding under a rock, you know that FPGA-based compute acceleration is suddenly a hot topic. And, even from under the rock, you probably got the memo that Intel paid over $16B to acquire Altera a couple years back – mostly to capitalize on this “new” emerging killer app for FPGA technology. These days there is an enormous battle brewing for control of the data center, as Moore’s Law slows to a crawl and engineers look for alternative ways to crunch more data faster with less power.
As we’ve discussed at length in these pages, FPGAs are outstanding platforms for accelerating many types of compute workloads, particularly those where datapaths lend themselves to massively parallel arithmetic operations. FPGAs can crush conventional processors by implementing important chunks of computationally intense algorithms in hardware, with dramatic reduction in latency and (often more important) power consumption.
The big downside to FPGA-based acceleration is the programming model. In order to get optimal performance from a heterogeneous computing system with FPGAs and conventional processors working together, you need a way to partition the problem, turn conventional code into appropriate FPGA architectures, and realize that whole thing in a well-conceived hardware configuration. This requires, among other things, a good deal of expertise in FPGA design, as well as an overall strategy that accounts for getting data into and out of those FPGA accelerators, and a memory and storage architecture that’s up to the task. Getting it right is no small feat, and there are countless ways to go wrong along the way and end up with very little gain from your FPGA investment.
At this week’s Supercomputing conference in Dallas, Bittware (acquired by Molex earlier this year) announced they were “joining forces” with Nallatech (acquired by Molex last year as part of the Molex acquisition of Interconnect Systems, Inc.). In the FPGA acceleration world, this is a big deal. While FPGA-based acceleration may be a new and hot topic for most of us, and it may seem like a brand-new world is opening up for engineering exploration, these folks have been at it for a LONG time.
How long? Allan Cantle founded Nallatech to focus on FPGA-based acceleration in 1993.
Let that date sink in for a minute or two. Yep, Nallatech has been doing FPGA acceleration for over a quarter of a century. Dial that into your decoder ring and figure out what kind of FPGAs were involved way back then. We first began covering Nallatech in 2004, and by that time the company already had more than a decade of experience in dealing with the challenges of what was (at the time) known as “reconfigurable computing.” Bittware has a similarly storied history in acceleration, beginning with DSP-based boards for ISA bus back in 1991, and moving into FPGAs in partnership with Altera in 2004. It is likely that these two companies together have more combined experience and more design successes with FPGA-based acceleration than anyone else on the planet.
It’s interesting that these two long-time veterans in this technology would be united under the Molex banner. With the likes of Intel, NVidia, Xilinx, AMD, Arm, and others throwing crazy resources into the battle – acquiring talent and technology wherever they can to bolster their campaigns – there’s a bit of irony in a connector company cornering two of the most experienced names in the business. But that seems to be exactly what has happened.
According to the announcement, the combined Bittware/Nallatech teams (operating under the Bittware flag) will offer both Intel- and Xilinx-based FPGA acceleration solutions. By betting on both horses, Bittware takes the platform issue off the table. Bittware says they are targeting applications such as machine learning inference, real-time data analytics, high-frequency trading, real-time network monitoring, and video broadcast (among others). None of these come as any surprise, and it appears that Bittware is taking advantage of the combined resources of Bittware, Nallatech, and Molex to land the type of large enterprise customers who would have been leery of the more niche nature of Bittware or Nallatech alone.
Bittware breaks their solutions down into “Compute,” “Network,” and “Storage” – showing their experience right off the bat. While accelerating computation with FPGAs is the glamour play, every part of the computing system needs to be designed for the task or you’ll end up leaving performance on the table. And FPGAs can play a key role in accelerating each of these tasks. FPGAs cut their teeth in the networking business decades ago and have only more recently proven themselves in storage and compute.
On the compute front, Bittware offers HBM2-enabled FPGA devices from both Intel and Xilinx. HBM integration is new for both Intel and Xilinx and is a game-changing innovation that allows acceleration of applications that would otherwise be limited by the bandwidth of conventional discrete memory implementations. For example, Bittware’s 520N-MX is a full-height, double-width PCI-Express card that packs an Intel Stratix 10 MX FPGA, up to 8 GB Integrated HBM2 @ 512 Gbps, four QSFP28 cages supporting up to 100G per port, two DIMMs supporting DDR4 SDRAM, QDR-II+ SRAM or Intel Optane 3D-XPoint, two OCuLink ports for direct expansion to NVMe SSD arrays, and a “Board Management Controller” (BMC) for Intelligent Platform Management. If you’re doing 100G line rate network packet processing or compute-intensive data center applications that demand high memory bandwidths, that packs a lot of punch.
Looking at the Xilinx side of the aisle, the XUPVVH is a 3/4-length PCIe board with a Xilinx Virtex UltraScale+ VU35P/VU37P with integrated 8 GB HBM2 @ 460 GBps, PCIe x16 interface supporting Gen1, Gen2, or Gen3, four QSFP cages for 4x 40/100GbE or 16x 10/25GbE, and up to 256 GBytes DDR4. Because the VIrtex’s 2.8 million logic elements can crank up a lot of heat, Bittware uses what they call their “Viper” platform that uses computer flow simulation to drive the physical board design in a “thermals first” approach, including “the use of heat pipes, airflow channels, and arranging components to maximize the limited available airflow in a server.” Viper boards are passive by default, with active cooling as an option.
Beyond the capabilities and specs of the hardware, the team at Bittware has an incredible amount of experience getting real-world performance out of FPGA-based systems. Partitioning workloads, understanding memory and network bandwidth requirements, and managing thermals in complex data center installations is no small feat, and the expertise this team brings could be the difference between success or failure, or between an optimal system and an unbalanced compromise. With Bittware and Nallatech enjoying the larger footprint and resources of Molex, it will be interesting to see what kind of engagements they attract.