We’ve talked a lot in these pages about the battle for the future of the data center. FPGAs, we say, represent the path to enlightenment, the power panacea, the key to breaking the energy-hungry tyranny of the von Neumann architecture. Apparently, we are not alone in this line of thinking. Intel, for example, plunked down about sixteen billion votes in favor of an FPGA-based future by acquiring Altera. Xilinx joined a cadre of companies looking to crack Intel’s longstanding dominance of the data center by devising standards to facilitate open-architecture attacks on Intel’s proprietary fortress. Someday, the thinking goes, FPGAs will pave the path to Moore-esque improvement in the energy efficiency of computing, despite the demise of Moore’s Law itself.
Achronix, the longest-standing new FPGA company to challenge the dominance of Altera and Xilinx in recent years, is undoubtedly aware of this vast new marketing meadow where programmable logic companies will someday graze on a steady supply of rich, green sockets. It would be impossible to be in the FPGA business and not notice. But Achronix had a different thought: Why wait for the future? Why not find a way to apply FPGA fabric to improve the performance and efficiency of the data center right now, with the server hardware already in the field today.
Rather than sitting around waiting for some fancy 10nm, FinFET-driven, multi-patterned, EUV exposed, HLS-driven, HMB-having revolution to slowly sweep through the cubic miles of already-working server farms, transforming the landscape from primordial programmable ooze, Achronix came up with a clever strategy to boost today’s servers using today’s FPGAs.
At the top level, the strategy is simple: produce a PCIe board that plugs into current servers and provides FPGA-powered acceleration to the critical packet-passing and networking functions. By improving the performance and power efficiency of those operations, the entire data center can run faster and more efficiently – without ever swapping out a rack or blade.
To fully understand the cleverness of this strategy, it helps to know the story of Achronix. Achronix was originally founded with a plan to exploit a clever new asynchronous architecture for FPGA fabric. The company’s picoPIPE filled the fabric with tiny registers that pipelined the entire design at a very fine granularity, allowing extremely (at the time) high Fmax of over 1GHz. Over time, however, Achronix’s strategy evolved. They put the picoPIPE on the back burner and focused instead on developing purpose-built devices to tackle the most common and highest-value networking applications.
For those who followed the breathless battle between Xilinx/TSMC and Altera/Intel to see who could be the “first” to market with a FinFET FPGA, it might be a bit of a surprise that the actual “first” company to ship a FinFET FPGA was Achronix, with their 22nm, Intel-fabbed, Speedster family. The combination of FinFET technology and very strategic hardening of IO interfaces make Speedster a very competitive device for high-speed networking. Perfect, in fact, if one wanted to accelerate the networking functions in data center applications.
Most of our attention to data center acceleration with FPGAs has been on the acceleration of algorithms. Accelerating legacy applications is obviously the long-range vision for reconfigurable computing in the data center. But reaching that goal is a complex, multi-dimensional, inter-disciplinary endeavor. In order to get there, we need a reasonable way to handle the programming aspect of FPGA-based acceleration and a standard architecture that allows FPGA-accelerated applications to port among machines, to communicate and coordinate with conventional processors, and to access shared memory in standard ways. These are substantial challenges that require re-architecting almost every aspect of server design.
Applying FPGAs to the networking tasks, however, requires much less inverting of the fruit basket. It also happens to fit nicely with the strengths of Achronix FPGAs. Toward that end, the company announced this week that they are producing PCIe accelerator boards that plug right into existing machines. Of course, nothing about these boards precludes accelerating applications as well, but the low-hanging fruit is definitely in the networking arena.
The Achronix Accelerator-6D comes in PCIe form factor and boasts 4 QSFP for 4x40G Ethernet. Why not 100G, you might ask? That’s because today’s data center installations, by and large, simply don’t have it yet. And, the point of Accelerator-6D is to accelerate TODAY’s data centers. Accelerator-6D also includes 6 ports of dual-slot DDR3x72, yielding 690 Gbps of aggregate memory bandwidth on up to 192 GB of DDR3 RAM. The board also includes (of course) an Achronix Speedster 22i HD1000 high-density FPGA. The HD1000 has hardened Ethernet MACs that can run at 10/40/100G, 100G Interlake (10Gx12), PCIe gen 1/2/3 (including DMA) with x1, x4, and x8 capability, and 72-bit banks of 1600 Mbps DDR3 controllers.
Achronix Accelerator-6D could be used for a wide variety of tasks, from accelerating applications such as search, analytics, image/video processing, and machine learning, to more approachable network functions such as encryption/decryption and network stack offloading. A smart NIC could be used to offload TCP/IP, perform remote direct memory access (RDMA), as well as RDME over Ethernet (ROCE), and iWARP. The company says Accelerator-6D is a perfect platform for hardware acceleration of NFV functionality, which applies virtualization technology to consolidate diverse networking equipment into an industry-standard environment. While today’s NFV is implemented almost completely in software, accelerating some functions in hardware is a natural fit.
Accelerator-6D can be used to support native NIC functionality with the addition of redundant port interfaces, intelligent traffic redirecting, encryption/decryption, and per-port dedicated memory bandwidth. With each 40G QSFP port allocated its own dedicated 32GB deep buffer, you can guarantee 100% receive-packet buffering. Critical table functions are implemented with DDR5 and DDR6, giving a performance boost where it is needed most.
As an NIC, Accelerator-6D leverages the intelligent NIC architecture and provides flexible allocation of logical or physical ports to direct security information to remote or host-resident monitoring applications. It provides a dedicated tunnel encapsulation engine for monitored flows, and it allows flow-table modifications to support multicast extensions. Access to control and encryption is implemented with the security control plane isolated. The FPGA provides visibility and tunnel generation and switching (OVS & NSH) functions.
Accelerator-6D can also be used on the test and measurement side, providing a high-performance host to extend the platform test architecture, and allowing the same hardware to be transitioned for use in normal operation when testing is not underway. On the transmit side, the deep on-board buffering supports the generation of complex packet, flow, and session data streams, and both local and system network timing are supported to enable advanced networking protocol, traffic shaping, and load-balancing prototyping. On the receive end, dedicated port buffering is well suited for guaranteed packet handling, flow, and session state capture. The combination could provide flexible, high-performance logic analyzer capability built right into the system hardware.
While these functions are less glamorous than the high-flying future vision for FPGA-accelerated computing, they are things that could be accomplished today with no significant invention required, using today’s installed data center hardware and, of course, the Accelerator 6D. The tools required to configure and “program” Accelerator-6D are the same as one would use natively for the FPGA: a combination of Synopsys Synplify-Pro for synthesis and Achronix-developed tools that make up the ACE (Achronix CAD environment). The company also has multiple reference design templates that can get you started on your custom design.
Accelerator-6D begins shipping in July 2016, with single-unit pricing at $7,500 and in volumes of 100 units as low as $4,500/each. The price and performance put the Accelerator-6D in a favorable position compared with other FPGA-based alternatives for data-center network acceleration. And, although they’ve been relatively quiet for the past year or so, it appears that Achronix is preparing to hit us with a series of interesting announcements over the coming months. It will be interesting to watch.