FPGAs Duel in the Data Center

When there are only two competitors in a race, the tactics change dramatically. Winning is no longer necessarily a matter of simply going as fast as possible. In bicycle match sprints, the winning strategy is actually to stay behind, drafting the leading bike until seconds before the finish line, then catapulting past for the win with a burst of speed built in the wind shadow of the unfortunate leader. In yacht match racing, “covering” is the proven way to victory – mimicking the moves of the rival, and only rarely taking the risk of diverging in order to gain the advantage.

The high-end FPGA market has always been a match race between Xilinx and Altera. For decades, the two companies have jockeyed for position, each trying to outsmart and outrun their adversary with both technological prowess and marketing cunning. When Altera gained ground by using TSMC as their fab partner, Xilinx covered and neutralized that advantage by moving to TSMC themselves. Then, Altera took a risk and jumped to Intel’s fabs for their high-end devices. When one company moved to wider LUT structures, the other followed. When one overhauled their tool suite, the other countered. At every level, each novel innovation on either side was fast-followed by an answer or countermeasure by the opponent.

Now, the FPGA bragging rights battle has moved to the data center, which could be by far the largest new market for FPGA technology in decades. FPGAs can improve every aspect of data-center operation. They can outperform every other networking option with their ability to create fast, software-defined structures that send packets to their final destinations with maximum speed and minimum power consumption. They can compress and optimize the mountains of data being crammed into storage facilities. And, perhaps most interesting of all, they can accelerate computation while dramatically reducing power consumption.

Last year, Intel spent over $16B to acquire Altera, and that’s a lot of coin for a company of Altera’s size (around $2B annual sales) and growth rate. In fact, it’s so much coin that we believe (and Intel has said) that there were key strategic reasons for the acquisition that justified the premium. Obviously, (as we’ve said before) one of the biggest strategic reasons would be to protect and future-proof Intel’s dominance in the much larger data-center business.

Sure enough, Intel has announced that it is pursuing a number of tactics to take advantage of Altera’s FPGA prowess combined with Intel’s Xeon-everywhere-ness, creating new solutions for the data center that combine FPGAs with conventional processors to accelerate and reduce power consumption on key high-load applications. But long-time Altera rival Xilinx is not content to just sit around and wait for the Intel cloud to bring doom and gloom to their data center ambitions.

Watching the nascent strategies for both companies, we are starting to see a divergence of paths – fundamental differences in approach and architecture that will likely define the rules of the battle for years to come. Neither Xilinx nor Intel/Altera is either content or properly positioned to merely cover the other’s strategic moves. Each has unique advantages that they must exploit in order to come out on top, and it isn’t clear which company’s course will take them to data-center dominance in the coming decades.

The first major divergence is esoteric, architectural, and subtle, but it has potentially monumental implications. A few years ago, Altera decided to take bold steps toward compute acceleration in the data center. One of the key perceived deficiencies of FPGAs in computation has always been floating point. FPGAs brought enormous fixed-point power to the table via scores of optimized DSP blocks, but floating-point computation was much less efficient. Altera aimed to address that deficiency by adding optimized, hardened floating-point support to their FPGAs through an overhaul of their DSP blocks, adding IEEE 754 single-precision hardened floating-point DSP to the mix.

According to Altera/Intel, this floating-point support is a key advantage, bringing potential TeraFLOPS performance to FPGA-accelerated tasks that is unmatched by Xilinx’s offering. According to Xilinx, Altera’s addition of floating-point support came at the hidden cost of significantly worse performance on narrow fixed-point operations, of the kind required by neural network inference algorithms. AI Neural networking is a key application for data centers in the coming decades, so winning at AI is potentially one of the best routes to the bank.

In deep learning, the first step is “training,” where the system learns its job by analyzing mountains of raw data. Once training is complete, the mode changes to “inference,” where the system applies its knowledge to situations in the real world. According to Xilinx, the big potential market in AI is inference, which must be much more broadly deployed than training. If training requires intense floating-point computation and inferencing requires massive small-bit-width fixed-point computation, Xilinx could gain an advantage with faster fixed point, and Altera’s floating point could be a liability. If Altera/Intel is right and floating point dominates the acceleration-using applications in the data center, Altera’s floating point support is a formidable weapon.

Moving our discussion to other strategy differences – before Intel even hinted at acquiring Altera, they announced that they planned to create new processors that combined their ubiquitous Xeon with an FPGA in the same package. Clearly the plan was to offload compute loads at a very fine-grained level, with a single processor having an FPGA “buddy” available via high-bandwidth, low-latency connection to accelerate compute-intensive operations. Because Intel now owns both the FPGA and the processor pieces of this puzzle, they have ultimate flexibility to optimize this processor-plus-FPGA-device architecture, taking advantage of proprietary EMIB packaging technology to build fast, tight bonds between processor, FPGA, and high-bandwidth memory. Clearly, in this fine-grained race, Intel/Altera has a major strategic advantage.

But, fine-grained architecture isn’t the only approach to FPGA-based acceleration. An alternative school of thought is to pool FPGA resources into clusters where multiple servers can share them in a hyperscale fashion. This is the approach used by Amazon, for example, according to a recent announcement that Xilinx devices will be deployed in a new “Amazon FPGA cloud.” By freeing the FPGA resources to float where they are needed most, rather than chaining each to a corresponding processor, Xilinx claims that utilization of FPGA compute capacity will be much higher.

The question of whether the fine-grained or the cluster approach to FPGA acceleration is far from answered. With the two competitors seeming to head down drastically different paths in this area, it could be a major factor in determining the winner. Of course, Xilinx itself could be bought out any day, and a more intimate relationship with a maker of processors could cause Xilinx to jump to the Intel/Altera fine-grained track.

Clearly, though, the largest single obstacle to wide deployment of FPGA-based acceleration in data centers is software development. FPGAs bring incredible potential to data center computing. We’ve estimated that three orders of magnitude in performance-per-watt is achievable just by optimizing the use of FPGAs as accelerators. The problem has always been programming them. Porting a software application to a heterogeneous reconfigurable computing platform with both FPGAs and conventional processors currently requires a massive investment of engineering expertise and manpower.

Here too, the two companies are taking significantly different approaches. Altera’s first shot was their initiative to allow FPGAs to be programmed with OpenCL, whereas Xilinx has worked for years on advancing high-level synthesis (HLS) technology. allowing sequential algorithms written in C or C++ to be optimized for FPGA-based implementation. Both strategies today are arguably still in their infancy, with Altera laboring to prove to the GPU crowd that they can target their code to FPGAs with superior results, and with Xilinx working to achieve wide-scale deployment of HLS tools for FPGA design.

We’d have to give Xilinx the advantage on the high-level tools race at this point, but Intel/Altera’s already-long history with their OpenCL approach has them ahead on the experience front. Both companies have a long, long way to go before we have a tool environment that facilitates wide-scale general-purpose deployment of FPGA-based acceleration, however. We are still clearly in an early-adopter phase where only the best-funded data-center customers can avail themselves of FPGAs.

Speaking of the best-funded data-center customers, the so-called “Super 7” (which includes Facebook, Google, Microsoft, Amazon, Baidu, Alibaba, and Tencent) constitute an enormous amount of data center opportunity – both in terms of the sheer amount of business they represent, and because of their influence on the rest of the data center industry. Obviously, Xilinx’s recent Amazon FPGA Cloud win is a major victory in terms of the company’s ability to beat out Altera/Intel at acquiring a Super 7 customer. But the win may have even more important implications in terms of the software development battlel.

By deploying Xilinx FPGAs in Amazon Cloud, Xilinx’s architecture, tools, and technology become the default for the numerous companies taking advantage of Amazon’s services. If you’re already an Amazon Cloud customer and you want to accelerate your application with FPGAs, you are kinda automatically signed up for Xilinx and the coarse-grained pooled-FPGA approach. This makes the Amazon win have potential for Xilinx that far exceeds simply winning a single Super 7 deal against Intel/Altera. It could promote a ramp-up of application development that favors Xilinx’s architectural approach.

Intel has historically relied on what we call the “x86 moat” to defend their dominance in data center processing. But, with FPGAs poised to become major components of data centers moving forward, the ability to efficiently handle legacy x86-optimized software could take a back seat to the new requirement to take advantage of the energy and performance benefits of FPGAs. This could represent a huge discontinuity in data center design, which could also lead to major market shifts in supplying gear to those data centers.

While Intel made a brilliant strategic move in bringing Altera into the fold, there are still significant battles to be fought in determining who will win the lion’s share of the enormous data-center opportunity over the next two decades. It will be interesting to watch.

FPGAs Duel in the Data Center

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk