Xilinx scored a major win recently, with Microsoft’s Azure cloud group reportedly making a commitment to use Xilinx devices in something like half of their future Azure deployments. Until now, Azure has been solidly in the Intel PSG (Altera) camp for FPGA-based acceleration. Microsoft says that every Azure server for the past several years has been equipped with FPGAs, and, until now, those FPGAs have come exclusively from Intel/Altera. Microsoft also claims that Azure is the “world’s largest cloud investment in FPGAs”.
Translation: This is a big deal in the new war for dominance in the data center.
Early this year, Xilinx – under the leadership of new CEO Victor Peng – announced that they were shifting strategic gears and becoming “Data Center First.” While Xilinx has been active in the data center business for years, the decision to change the public face of the company after decades of being the “#1 FPGA supplier” represents a radical and risky philosophical change. Deliberately trading the cruise-control comfort of being viewed as the dominant supplier in an important and rapidly-growing semiconductor segment (FPGAs), for a distant “challenger” position in a much larger market (data center) that is completely and utterly dominated by the giant company who just bought your longstanding archrival is – well – pick a word: Bold? Ill-advised? Confident? Crazy?
Xilinx’s strategy appears to be getting some traction, as earnings and revenues have been strong, and the company has scored key wins in a number of major data center opportunities. How major? Let’s take a look at the so-called “Super 7” data center companies: Facebook, Google, Microsoft, Amazon, Baidu, Alibaba, and Tencent. These are the companies whose data center needs are so enormous that they can afford to essentially “design their own.” Looking at the three top cloud services suppliers: #1 Amazon is thus-far solidly in the Xilinx camp – with large deployments of F1 instances based on Xilinx hardware. #2 Microsoft was (until this development) an Intel/Altera shop, making Xilinx’s new ~50% share a major coup. #3 Google is… behaving like Google, developing their own “Tensor Processing Unit” chips specifically aimed at accelerating neural network inferencing and possibly working with Xilinx and/or Intel on FPGA acceleration as well (but they’re being quiet about it). Baidu announced last year that they were deploying Xilinx FPGAs in their public cloud services, Alibaba also chose Xilinx for their F3 cloud service acceleration, and Tencent has gone with Xilinx for their FPGA cloud as well.
Looking at the score from the Super 7, you might be tempted to think that Xilinx has this whole data center thing just about wrapped up. But, let’s look a little deeper.
Facebook, always visibly dancing with Intel/Altera on their data center projects, is rumored to be developing “their own FPGA.” This could be a custom chip using FPGA IP from one of the emerging eFPGA suppliers such as Achronix. And this brings up an important thing to consider about the Super 7 versus all other data center customers on Earth. As we said above, the Super 7 are so big that they can afford to design their own gear – including custom chips, as proven by Google and Facebook, and as evidenced by the strong business that eFPGA companies are doing marketing their IP for data center acceleration. We don’t know exactly who those eFPGA companies are licensing to, but by definition they are companies with the wherewithal to do custom ASIC/SoC design. That’s a pretty exclusive club.
One other company with a strong horse in the race is NVidia, of course, who is currently the largest supplier of chips for data center acceleration – with their GPUs being used to accelerate a wide range of applications, but particularly neural network training. But GPUs are extremely power hungry, and other solutions (such as FPGAs) are likely to dominate in the larger market for accelerating inferencing workloads.
The biggest challenge faced by companies like Xilinx and NVidia in the data center is, simply, Intel. For example, while Xilinx has scored impressive wins in FPGA-based acceleration in the Super 7 (as noted above), every single one of those companies spends vastly more with Intel on the rest of their data center gear. Intel dominates the data center with something like a 90% market share (depending on how you calculate it). As a full-solution supplier, Intel controls the ecosystem and (by and large) the architecture of most of the world’s data centers, and the vast majority of servers in the world are sold by OEMs such as Dell, HPE, and Lenovo, and those companies all ship variants of “whatever Intel says is good.”
One thing that Intel has been saying is “good” lately are systems with Intel FPGAs already built in. Dell and others have begun marketing servers that already come with FPGA-based accelerators, creating a severe uphill battle for third parties like Xilinx. It’s difficult to convince a customer to add an aftermarket FPGA accelerator to a server that already has FPGA acceleration built in. For XIlinx to really challenge Intel in the acceleration game, they’ll need to persuade some of the major OEMs to offer their FPGAs instead of Intel’s. Otherwise, if Xilinx remains the choice of the Super 7 but doesn’t crack the mass market, they run the risk of becoming the “Ferarri” or “McLaren” of the acceleration game – revered and respected by the well-heeled elite for their exceptional performance, but completely out of the more lucrative broader market.
The silent backdrop for all of this FPGA acceleration talk is tools, and the tool problem may still be the deciding factor long-term in ownership of the data center. We all know that FPGAs are dauntingly difficult to program for the average application developer. And the number of true “FPGA experts” in the world is vanishingly small, leaving a situation where the masses don’t have a reasonable path to develop their own optimized FPGA-accelerated applications. While Xilinx, Intel, and others have invested heavily in creating development environments that could produce reasonable results for the average Joe wanting to accelerate his application with FPGAs, we are still far from anything that remotely resembles the productivity level of a modern software IDE. What we have are cobbled-together flows that start with custom-written accelerators in languages like OpenCL and (eventually) produce suboptimal FPGA accelerators that have to be carefully and skillfully inserted into the context of larger application software.
The workaround – as long as the general server-application-developing public doesn’t have adequate tools for designing their own FPGA-accelerated applications – is pre-designed accelerator IP for the most common high-value tasks that can benefit from FPGA acceleration. In fact, a number of companies such as Accelize are seeking to capitalize on what may be a large market for third-party pre-optimized FPGA acceleration IP. For companies without Super 7 resources, this may represent the most expedient and least risky path to the benefits of acceleration for the next several years.
What then does Xilinx’s Microsoft win suggest about the acceleration war? One argument might be that Microsoft is simply doing the prudent thing. By betting on both horses, they guarantee they won’t be on the wrong one. And, in the third-party cloud business, it also would benefit them to have services available for their customers using acceleration IP developed for either Xilinx or Intel FPGAs. Microsoft says they plan to continue using Intel FPGAs as well, so a dual-pronged strategy makes sense for them
Clearly, though, looking at the Super 7 wins, Xilinx is getting traction with their argument that they have a leg up on Intel when it comes to technology for FPGA acceleration. Xilinx delivered ahead of Intel/Altera in each of the last several process nodes, and that has kept them in a position of leadership in high-end FPGA performance. If you’re a Super 7 company and the only thing you care about is performance, you’re going with Xilinx right now. But, with Moore’s Law slowing to a crawl, the playing field on “who’s on the latest process node” is likely to be leveled. That means that the competition for technical supremacy will have to be fought with architecture and tools.
On the sales and marketing side, Intel enjoys an enormous advantage as a full-solution supplier against Xilinx’s chip-based offering. And, while Xilinx has been posting record revenue numbers, Intel’s FPGA revenues are also way up – with reports that they are actually gaining FPGA market share against Xilinx. It will be interesting to see how this plays out as the deployment of FPGAs in the data center (and not just for acceleration) expands beyond the Super 7 over the next few years.