It’s clear that programmable logic and FPGA technology will capture an increasing share of the value in conventional and cloud data-center deployments. While FPGAs have always been used in connectivity and storage, there is an ever-building push to have high-end FPGAs take over a crucial role in computation as well. FPGAs pack a potent combination of massive computational throughput, low latency, and power efficiency that is unmatched by any rival technology. With the huge growth of data-center demand fueled by IoT, continuing to power the cloud exclusively with conventional processors is just not feasible. Heterogeneous deployments of conventional processors and FPGAs working together have the potential to boost computational performance many times over and, more importantly, dramatically cut power consumption.
There is, however, a series of substantial obstacles to widespread use of FPGA technology in computing. The first of these is application development. Programming a heterogeneous processing system with FPGAs is extremely difficult and requires significant hardware design expertise in addition to traditional software skills. Both Xilinx and Intel (as well as a few third parties) are working hard to lower the bar on developing applications (and porting legacy applications) so that specialized FPGA expertise is not required – or is at least of minimum importance. This has put us in a “battle of the tools” situation that will be playing out for a long time to come.
In addition to continuously improving the tools, FPGA companies need to jump-start development of accelerated applications. And, most importantly, they want to jump-start that development in a way that prefers their architecture versus their competitor’s. This month, Intel announced a win with Alibaba, one of the “super 7” cloud providers, to deploy Intel Arria 10 FPGAs (along with the company’s Xeon processors, of course) in Alibaba Cloud (Aliyun). This will allow Alibaba Cloud customers to take advantage of FPGA-based acceleration in their rent-a-cloud applications.
This announcement parallels Xilinx’s announcement last November that it had won a deal to provide the FPGAs for Amazon’s FPGA Cloud, via so-called “EC2 F1 Instances.” At this point, we know a lot more about Amazon’s (Xilinx) FPGA Cloud than about the new Alibaba (Intel/Altera) Cloud. Amazon’s cloud is already in customer-preview mode, while Alibaba’s is just announced with few details, other than that there will be a “pilot program.”
It is important to contrast these Amazon/Alibaba-hosted cloud announcements with proprietary, in-house deployments of FPGA-accelerated services such as Microsoft’s Catapult (which uses Intel/Altera FPGAs). Hosted cloud deployments of FPGA-based acceleration will give us exactly the “jump-start” effect that both Intel and Xilinx are after by lowering the barrier to entry for application development teams who are creating high-value applications to take advantage of FPGA technology.
Hosted services give application developers and service providers the ability to develop and deploy FPGA-based applications without having to buy (or, more likely build) FPGA cards for their specific deployment. It also allows them to scale their deployment according to demand, rather than having to build their own infrastructure to handle their peak load. Finally, standardized hosted servers allow software components from different sources to be potentially mixed-and-matched, putting together, for example, video-processing acceleration with neural-network acceleration.
There is an enormous catch to this, unfortunately.
Just as technologies like virtualization seem to be on the cusp of insulating application developers from the vagaries of server and processor architectures, allowing easily portable development of key data center and cloud applications, FPGAs come along and ruin the whole thing. There is no “virtualization layer” equivalent that will allow easy portability of FPGA-accelerated applications. In fact, Xilinx and Intel are each doing everything they can to make porting more difficult. This turns the data center duel into a high-stakes “winner take all” game, where application teams must decide which horse to ride. Do you want to develop your neural network on Amazon or Alibaba? Chances are, it will be almost a complete do-over if you want to support both. Choose the wrong one at your own peril.
Both partnerships will likely be initially wooing developers of important “infrastructure” applications that can be broadly applied. Amazon is specifically looking for genomics research, financial analytics, real-time video processing, big data search and analytics, and security. They are encouraging developers to use their tools (much of which are simply Xilinx tools) to develop accelerated applications and offer them to other customers via the AWS marketplace. Alibaba says they are targeting machine learning, data encryption and media transcode.
There are already numerous differences apparent between the Xilinx and Intel approaches to FPGA cloud acceleration. Xilinx/Amazon are targeting Xilinx’s latest, biggest, 16nm FinFET Virtex UltraScale+ devices. Intel is at least a year behind Xilinx in delivering FinFET FPGAs, so their platform is the Arria 10 mid-range FPGA, fabricated on a 20nm planar TSMC process. (Yep, that’s right. TSMC is manufacturing BOTH the current Xilinx and Intel FPGAs for data center applications. Who is the guaranteed winner here?)
Looking at the architecture, Intel seems to be pursuing a fine-grained pairing of processor and FPGA, mating Arria 10 devices with Xeon processors in the same physical package. Xilinx/Amazon, on the other hand, are deploying FPGAs in clusters, connected more loosely, presumably, to Intel Xeon processors. (There are also initiatives with ARM-based processors in data centers, but that’s orthogonal to this discussion). Obviously it’s to Intel’s advantage to sell a Xeon for every Arria, and they certainly block an ARM incursion with this approach, but this fundamental difference in system architecture profoundly affects application development strategy. It is too early to know which architecture will perform best for the key data-center applications. Of course, each company makes a case for their approach being superior.
At the chip level, there are key differences between Intel and Xilinx as well. Intel/Altera added hardware floating-point support to their FPGAs a couple years ago. The idea there is that much computation is floating-point, and fixed-hardware floating-point will outperform software or programmable logic floating-point by a significant margin. Xilinx, on the other hand, claims that adding floating-point support comes at a cost in terms of fixed-point performance, and that, as a result, their devices have a significant advantage in fixed-point workloads. Xilinx claims this difference favors them in applications such as neural-network inference, for example.
Another key capability required for data-center deployments is partial (or rapid) reconfiguration of the FPGA fabric as various accelerators are swapped in and out. In this arena, Xilinx has a long historical lead, as they’ve successfully supported partial reconfiguration for years. Intel/Altera are newer to the partial reconfiguration game, and less is known about the mechanics and efficacy of their solution for data center applications.
The most critical battle, of course, is the slow burn of tool evolution. Intel (Altera) fired the first shot across the bow here several years ago, embracing the OpenCL initiative by developing tools that compile OpenCL code (typically used for general-purpose GPU programming and acceleration) on Altera FPGAs. Xilinx, on the other hand, was leading with their high-level synthesis (HLS) technology, which can create optimized hardware implementations of C/C++ code. Since those early days, each company has both innovated and responded. Xilinx announced OpenCL support, and Intel has now (quietly) announced HLS. Xilinx has released a number of tool/software/IP suites specifically attacking computation such as their SDAccel development environment, aimed at teams doing FPGA-based compute acceleration, and (more recently) their reVISION “stack” aimed specifically at vision applications. Thus far, Intel has taken a more generic approach with their tool suite evolution.
This data center war is just beginning. For Intel, the stakes are much higher, as the discontinuity created by an industry-wide migration to heterogeneous computation with FPGAs has the potential to put their much-larger monopoly on data center processors at risk. In many ways, Xilinx has an early technological lead, and they have chalked up some key victories in the market. Obviously, Intel has the potential to exploit their existing data-center dominance in ways that will benefit them, but so far there is no visible strategy for doing that. It will be interesting to watch.