Intel has just announced a major counter-strike in the battle for the data center of the future. The company announced a new suite of software tools, which (according to a blog post by Barry Davis, General Manager, Accelerated Workloads Group, Intel® Data Center Group) “make FPGA programming accessible to mainstream developers, a major leap forward for customizable silicon solutions that complement the endlessly [SIC] diversity of customer-defined workloads, including 5G network processing, artificial intelligence, data and video analytics, machine learning and more.”
We love to say “Told ya so!”
Back in 2014, around a year before Intel announced their intent to acquire Altera, we ran an article that asked “When Intel Buys Altera – Will FPGAs Take Over the Data Center?” We weren’t just predicting in the dark. At the Gigaom Structure 2014 event, Intel’s Diane Bryant announced that Intel would be “integrating [Intel’s] industry-leading Xeon processor with a coherent FPGA in a single package, socket compatible to [the] standard Xeon E5 processor offerings.”
That got our attention.
Intel wasn’t giving any additional info on the topic – not even the identity of the FPGA they were planning to use. But we wouldn’t be very good journalists if we couldn’t narrow from a possible field of two. The other option, Xilinx, was charging ahead full steam with a strong field of Intel competitors as partners to address the huge opportunity of FPGA-based acceleration. Altera had jumped to an early lead in recognizing the data center opportunity, and it just so happened that Intel was already Altera’s manufacturing partner, AND Intel had already shipped another product – the Atom E600C (better known, perhaps, by its original code name “Stellarton”), which combined an Altera FPGA with an Intel Atom processor to create what turned out to be an epic failure in the embedded market.
If at first you don’t succeed…
But the data center is a whole nother ballgame compared with the embedded market. Intel has struggled (alternative spelling “FAILED”) in their attempts to capture a meaningful presence in embedded, but they OWN the data center – with something like a 6,000% market share. (Our numbers may be a little off there, but you get the idea.) This is a business that has a value to Intel in the double-digit billions per year. It matters. And here sat Altera with what appeared to be a nice lead in the whole concept of capitalizing on the coming data center discontinuity. Plus, “Hey, we’re already building their chips for them!” When you’ve got synergy like that (and an extra $16B or so in cash lying around), why NOT make the move and acquire Altera?
The problem is, buying big companies takes time. Making new data center processors takes more time. Perhaps the most important and difficult part – coming up with a suitable software stack to support programming a whole new kind of data center processor – takes a LOT more time. But market discontinuities don’t just wait around for the previously dominant player to get their act together. Intel’s competitors had already jumped in with both feet. In 2014, Xilinx announced “SDAccel,” which the company bills as a “development environment for OpenCL™, C, and C++ [that] enables up to 25X better performance/watt for data center application acceleration leveraging FPGAs.” Intel’s clock was ticking.
In the time that followed, Intel’s competitors strengthened their attack. With the advent of new critical applications (particularly AI), a data center discontinuity of epic proportions was looming. Data centers would need acceleration to meet the performance, and, particularly, the power requirements, of the future. If a dominant player like Intel was to be dislodged, this was the time to do it.
Now, three years later, Intel is getting their defenses deployed. With Altera firmly in the family, the company has launched the long-promised new line of data center processors. Altera has launched its Stratix 10 and Arria 10 FPGAs, and now they are following with the software stack. The new announcement has three major components: The Acceleration Stack for Intel Xeon CPU with FPGAs, the Open Programmable Acceleration Engine (OPAE) Technology, and the Intel FPGA Software Development Kit (SDK) for OpenCL*
The Acceleration Stack for Intel® Xeon® CPU with FPGAs consists of software, firmware and tools, designed to make it easier to develop and deploy Intel FPGAs for workload optimization in the data center. It includes hardware interfaces and associated software APIs for interfacing Xeon processors to FPGAs.
Intel’s Open Programmable Acceleration Engine (OPAE) is “a software programming layer that provides a consistent API across FPGA product generations and platforms. It is designed for minimal software overhead and latency, while providing an abstraction for hardware specific FPGA resource details.” Intel has open sourced the technology in an attempt to gain broad adoption in the FPGA-based acceleration arena.
The Intel® FPGA SDK for Open Computing Language (OpenCL™) is kit that abstracts away the RTL-based FPGA development process using a higher level software development flow. It allows you to emulate your OpenCL C accelerator code on an x86-based host, and it gives a detailed optimization report with specific algorithm pipeline dependency information. This lets you develop your OpenCL in a tight iteration loop on an x86 machine without having to run through the much longer synthesis->place-and-route FPGA flow until the end. Intel says you can also “leverage prewritten optimized OpenCL or register transfer level (RTL) functions, calling them from the host or directly from within your OpenCL kernels.”
It is impossible to overstate the importance of this development flow in the adoption of FPGA acceleration in data centers. Competitively, Altera had a big head start with OpenCL prior to the Intel acquisition, and Xilinx has played catch-up in that arena. On the other hand, Xilinx was out first with a comprehensive development stack similar to the one Intel just announced. One clear advantage Intel enjoys is the dominance of Xeon/x86 in the data center, and their ability to pair tool flows with that architecture.
In the past, we talked a lot about the importance of the ability to run legacy data center software on an accelerated platform. But because of the rapid uptake of new applications – particularly AI and neural networks – legacy code may end up being much less important than performance, power-efficiency, and portability of AI applications. Rather than wooing the software developers, it may be the data scientists with the most clout in future data center decisions who need to be convinced of the viability of an accelerated data center platform.
Both camps have scored significant early victories in this computing revolution. Xilinx won well-publicized sockets in Baidu and Amazon, Intel/Altera have had a number of high-profile victories at Microsoft, and there are undoubtedly other deals on both sides that have not been publicized. It is interesting to note that there is a bit of an architecture difference emerging between the two camps, with Intel/Altera leaning toward a paired architecture where one FPGA is paired with one Xeon. Xilinx is showing up in pooled architectures where a pool of many FPGAs is deployed as a networked resource. It’s unclear how much impact this difference will have on those developing applications for these two architectures, or what the real-world performance and efficiency differences will be. It will be interesting to watch.
6 thoughts on “Game On for FPGAs in the Data Center”
Just a request. Can you provide source links for your articles? For example, you reference a blog post don’t provide a link to it for further reading. The only link in this article is to another internal page.
Is not linking to external pages/sources a policy here? It seems common.
Good point! The blog post you are asking about is here:
Good post. So much to unpack. Did you know for example you can compile a FPGA bitstream in seconds? And that FPGA can be faster than ASICs? http://ieeexplore.ieee.org/document/903398 The technology was buried in the bitstream paranoid epoc we are still in. OpenCL is great but did you know their compiler is best at pipelining regular C but they won’t advertise that cause it scares them for some reason.
When I was working at Altera (2013..) I tried to get them ahead of the curve and start a FPGA cloud but “that will never happen”. There are a set of application that would let FPGAs take over the Data Center but they are not sexy so they get no support. I’ve been working on this for over 30 years and it still looks like these people are wandering in the wilderness but won’t listen to anyone except their own voices.
Here are my first 5 patents which include processor and FPGA in the same package and on the same die
Things could be so much better if the FPGA companies would get out of their own way. I always think about “if your compilers are so good why don’t you use them to compile, place and route, your own devices? If they ate their dog food and used their own tool chain would be 100x better. I challenge Xilinx and Intel to eat their own dog food.
I also worked at Altera for a while – part of the job was looking at replacing Verilog models with C/C++. As far as I can tell the move off Verilog/VHDL has never happened, but it did inspire me to do my own project – http://parallel.cc
OpenCL sucks for programming FPGAs, it’s designed for GP-GPUs and stream-processing, not to mention APIs suck in general as an approach parallelizing software. These guys seem more clued – http://bigstream.co/ – and I have my own IP for doing it.
Intel has done very little over the years to make it easier to design hardware with their own hardware, so I’m not expecting a sudden improvement in results. Part of the problem is that X86 is just universally bad, as is ARM, since the architecture dates from the 1980s. However, at least ARM is low power, so you can get a lot of them in a box. I like this one – https://www.attalasystems.com/
Making C code go fast isn’t really the problem, it’s the communication overhead from splitting tasks across processors (FPGA, CPU, or GP-GPU…), and FPGAs are good for random communication structures. Unfortunately the compiler guys rarely look at that piece, and parking your FPGA behind a cache-coherent interface is a good way to lose the benefits.
Intel’s PSG OpenCL compiler works best when you don’t use any of the of the OpenCL GPUish constructs. It’s an efficient compiler when you just use the C99 constructs and you refactor your code to be pipelineable. The irony is that they didn’t support running C/C++ code as a program right out of the gate like Xilinx did. Best thing about OpenCL is that it’s the first time an entire standard has been used to synthesize hardware so you can take code written for CPUs and GPUs and compile it for FPGAs. Their compiler does 300+ optimizations that most hardware designers don’t even know about. I’d say the compiler is as good as an A-/B+ verilog engineer.
As far as Bigstream goes they bark about FPGAs but don’t show any proof that they can do the magic they claim (as far as I can see) but they have a story some VC fell for. They show some TPC benchmark data (but not the full benchmark) but only 2x-3x speedup which they say is done with software. They claim to speed up software with FPGAs by 30x but as you point out that’s hard to do considering the PCIe choke point and the fact that you have to get a program with 97+% of the performance running in an FPGA at 200x. Microsoft got 40x on the code of their Bing search engine but only a 2x overall acceleration (which saves them $100s of millions of dollars).
Having said that when FPGA manufacturers start to use their own devices for their own then they’ll design the needed devices and tools it will take to grab the data center business and keep it.
The above opinions are my own and are just opinions from someone in the reconfigurable computing business for 30+ years.
If we want FPGAs to make it to the data center and cloud computing world, we need to decouple the FPGA IP core developers from the cloud developers that will use these IPs through an easy to use interface and a common marketplace (e.g. aws marcketplace.) We are working toward this end since 2014, and we have developed IP cores for accelerating Apache Spark, ready to use from the Spark users without any prior knowledge in FPGAs http://www.inaccel.com