Sunday! Sunday… SUNDAY! Get your tickets now! Fire-breathing System-in-Package FPGAs square off in a duel for the ages! See Xilinx’s FinFET-powered 16nm TSMC-fabbed UltraScale+ Virtex and Zynq take on Intel’s new Altera-powered Stratix and Arria Generation 10 SoC FPGAs and SiPs. Watch Quartus and Vivado spar for design tool supremacy! Feel the heat as distributors such as Avnet, Arrow and others pile on legions of FAEs armed with development kits, reference designs, and IP blocks in winner-take-all battles for dominance in communications, data center, automotive, industrial, and IoT applications. That’s right, we’ll sell you the whole socket, but you’ll need only the edge.
OK, maybe the newest incarnation of the decades-old rivalry in the FPGA and programmable logic business doesn’t exactly pack the adrenaline-driving punch of a monster truck rally, but with Intel’s acquisition of Altera, the incursion of programmable logic devices into new and exciting markets, and the rapidly-approaching end of Moore’s Law, the stakes and intrigue have never been higher.
We thought we’d do a play-by-play comparison of the two big competitors’ offerings now that Altera’s branding has been replaced with that of the world’s largest semiconductor company. The FPGA game has become more complicated than ever, and it seems like a good idea to discuss the relative strengths and weaknesses of each team’s stable as we enter a new era of high-stakes competition.
First, let’s start with the easy stuff – like current market share rankings. According to just about every source that monitors such things, Xilinx has maintained a commanding market share lead for the past several years. Market share in programmable logic can be misleading sometimes, however, because, generally, this quarter’s sales depend on how well your company and technology performed two or three years ago – when the customers who are ordering in volume today were swayed by one company or the other’s convincing marketing pitch. To the FPGA companies, this means that if your sales are going well, your two-generations-ago technology was probably a winner.
A better measure of future performance would be sockets won by the latest devices. Engineering teams making the decision to design in the latest Xilinx or Intel/Altera parts today are likely to start making a significant contribution to the revenue streams in 24 months or so, when those new product designs are finished and entering volume production. How do the two companies stack up on the “sockets won” front? There’s no good way to tell. Talk to marketers at either company and they’ll tell you that their latest stuff is winning more new deals than ever. In reality, they might as well have all just left that as the outgoing message on their voicemail for the past decade or so, because no matter what the actual market does, both marketing teams are always winning – just ask them!
One thing that does have a big influence on design-ins, however, is first availability of the latest family on the newest process node. This has been the primary lynchpin for bragging rights for Xilinx and Altera since the big feud began. If your company could actually deliver the newest-best generation parts first to eager customers, you were likely to win the lion’s share of the new sockets. Once a company launches first on a new process, each passing month means more lost sockets for the trailing competitor. On this front, Xilinx has gone 3-0 for at least the past several nodes – 28nm, 20nm, and 16nm. The result of all that? Xilinx has been winning sockets and stealing the resulting future market share steadily for over half a decade and is likely to continue for at least as long as the current 16nm victory carries them.
What about this 16nm thing, anyway? It’s a big deal. This latest process node was the first where FinFET transistors would be used. FinFETs give FPGAs almost two process nodes’ worth of value because of their outstanding performance in both static and dynamic power consumption. Since semiconductor leakage current has been creeping up with each new process node, the FPGA world had reached a point where the incremental gains in performance, power, and area (PPA) were decreasing with each new generation – at the same time that the costs to produce each generation were growing exponentially.
You don’t have to plot those two curves very far into the future to see that we were on the threshold of a time when it didn’t make sense to build new FPGAs on a new node at all, because the incremental improvements would not offset the incremental cost. Today, in fact, only the biggest, fastest, most exotic families from each vendor are produced on the latest process node. It’s still more cost-effective to deliver the smaller devices on older process technology.
FinFETs, however, offered a much-needed boost to the race to the 16/14nm node. Both Xilinx and Altera set out hellbent on being the first to market with FinFET FPGAs – Xilinx working with TSMC and Altera with their new (at-the-time) foundry partner, Intel. Ironically, neither company delivered the actual First Fin-FET FPGAs. That title was claimed by Achronix, who delivered a family of high-performance 22nm FinFET FPGA devices fabricated by – double ironically – Intel. Between the big two, however, Xilinx scored a significant victory in the “first to ship” derby, with excellent engineering execution and capitalizing on doubtless distraction delays caused by Intel’s acquisition of Altera. It is likely, when all is said and done, that Xilinx will have been shipping FinFET devices for about a year before the Intel/Altera Stratix 10 goes out the door..
If these were the old days of programmable logic, we’d just mark that one in the “win” column for Xilinx. Xilinx’s marketers would be doing champagne toasts and Altera’s would be slugging Wild Turkey from the bottle, and then we would all just set our sights on the next process node about two years from now. (Note, that last part was made up. We have no actual knowledge of this sort of behavior from FPGA company marketing teams.)
These, however, are not the “old days of programmable logic.”
In today’s world, it isn’t enough just to win the next round of business from the communications and networking infrastructure folks (who have historically accounted for the dominant share of the high-end FPGA market). Today, you need to be able to woo engineering teams who have not been designing with FPGAs their entire professional lives. In fact, you need to be able to win business from teams who have never used programmable logic at all. And that’s a whole different kettle of fish.
First of all, designers in these new markets are not sitting with PO forms all filled out except for the company name, nervously tapping their feet while they wait to see whether Xilinx or Altera will be the first to offer them the most FPGA LUTs for the least cash. The new markets are much more sophisticated and discriminating prospective dates. In data center, for example, they need to be convinced that your FPGA will compete favorably with a GPU for their compute acceleration project and that your development tools can be used by their software engineers to “program” these newfangled things as easily as good ‘ol processors. They couldn’t care less about FinFETs, interposers, eye diagrams, or carry chains. Most of them aren’t aware that timing closure is something they would even need to worry about. Selling to these new markets is most definitely not FPGA business as usual.
To win in these new markets, the most important factor is not the silicon at all – but the software, service, and IP. Looking first at the software side of the equation, we have an interesting dilemma. Both Xilinx’s Vivado and Intel/Altera’s Quartus are very capable design tool suites. For years, Altera’s Quartus had a noticeable advantage over Xilinx’s aging ISE tools. In order to move forward, Xilinx had to completely re-architect and rewrite their entire tool chain, replacing ISE with the much-more-modern Vivado. After that, of course, Altera had the stable, reliable Quartus and Xilinx had several million unproven lines of brand-new, fancy, temperamental, Vivado code.
Once Vivado had a few years to mature, however, some of its more modern architecture really started to shine. In fact, it shone enough that Altera dove in and did a little rework under the hood themselves, producing their new “Quartus Prime” editions. Today, both companies can make compelling arguments about why their core implementation tools are better, and the reality is that both are very, very good at doing their job – getting from RTL code down to a working bitstream in an FPGA.
The tool game too, has been elevated, however. Each of the new markets for programmable devices has special requirements for how their designs are initially created. For compute acceleration, for example, nothing would be better than a tool suite that could take a legacy C/C++ software application and magically create an optimized, accelerated version that executed on a conventional processor paired with FPGA fabric. Such a tool does not exist, however, nor is it likely to.
What we do have are fairly primitive tool suites that can take OpenCL code written to accelerate compute loads on GPUs and implement it on FPGAs instead. On this front, Intel/Altera would seem to have the initiative. Altera rolled out their OpenCL strategy long before Xilinx was even talking about the data center, and the company has had good success winning acceleration sockets with that approach. For an extra boost, Altera made concrete steps to dramatically improve the floating-point performance of their FPGAs – which was both a silicon and a software play. While floating-point has always been largely irrelevant in the core FPGA market, the people who are doing compute acceleration have a whole different opinion.
Now, Altera’s already-strong data center strategy is paired with Intel, who is by far the dominant player in data centers worldwide. That combination is likely to be extremely hard to beat for an insurgent Xilinx, even when teamed with a coalition of capable companies all wanting to grab a chunk of the data center pie. The resources and technology that Intel could pour into creating a repertoire of winning data center solutions with Altera’s programmable logic technology are formidable. And, with one of Intel’s largest businesses at stake, the motivation definitely favors Intel as well. Defending a data center market many times the size of programmable logic is likely to light a fire under the Intel folks that could be seen from space. It should be interesting to watch.
A key technology for moving many markets rapidly into FPGA-based solutions, however, is high-level synthesis (HLS). HLS offers significant productivity advantages to teams trying to implement solutions in programmable logic (or any kind of logic, for that matter, but we are focused on FPGAs here). HLS allows designs to be created at a much higher level of abstraction than traditional RTL-based design. A few hundred lines of high-level HLS code (usually in a variant of C or C++) could capture the same design intent as tens of thousands of lines of traditional RTL. From that point, debugging, optimizing, re-architecting, and implementing solutions based on that code are exponentially faster and easier than the traditional RTL-based design flow. Teams replacing RTL-based design with HLS-based design routinely report 50% to over 90% reduction in design time and cost.
Combining the efficiency of HLS-based design with the programmability of FPGAs is a natural win. While HLS is not yet to the point of being the “magic compiler” that can convert conventional software into optimized FPGA-based hardware, its present-day capabilities are still game-changing. In this world, Xilinx has a clear lead. Over the years, Xilinx has acquired several companies with significant HLS technology – including AutoESL and AccelChip. Xilinx has moved aggressively in recent years to productize HLS technology, and recently to get HLS technology into the hands of as many of their customers as possible by including HLS in the normal editions of their Vivado tool suite. As a result, Xilinx has surpassed even the EDA companies such as Mentor Graphics, Cadence, and Synopsys in the number of designers actively using HLS technology. Altera, on the other hand, has yet to field a competitive HLS solution, and HLS is not a technology that can be developed or deployed overnight.
Even though FPGAs are considered “general purpose” ICs, market pressures have moved the devices more and more into at least domain-specific design. FPGA companies are putting specific optimized, hardened IP into their devices to address the needs of specific markets – and the automotive market does not have the same needs as the data center, which does not have the same needs as ASIC prototyping, which does not have the same needs as networking. Each of these application areas demands specific interfaces and combinations of DSP, memory, IO, and processor subsystems. On top of that, each of these demands domain-specific design entry tools and languages, reference designs, development boards, and IP libraries. Here again, Xilinx appears to be farther down the road of developing a broad product offering and, more importantly, a third-party ecosystem that allows them to support such a diversity of applications.
One of the key technologies in producing a diversity of domain-specific products efficiently is 2.5D packaging. Here again, the strategies and strengths of the two companies diverge. Xilinx clearly has the most experience with 2.5D packaging, as they have been delivering production devices based on silicon interposers for three generations now. Altera and Intel are comparatively new to that game, with the first real devices slated for release this year, but Intel’s EMIB technology appears to have some potential advantages when compared with silicon interposers. Interestingly, the two companies seem to be moving in opposite directions with the use of 2.5D technology. Xilinx, who previously packaged SerDes transceivers fabricated with one process alongside FPGA fabric from another, is now back to doing monolithic fabrication for LUT fabric and transceivers. Intel/Altera, on the other hand, has just started putting transceivers on different die, mixing Intel-fabbed FPGA chips with transceivers (probably) made with a previous process at TSMC.
On the business side, even though they own a significant lead in the FPGA market at the moment, Xilinx is a much smaller company than the now-combined Intel/Altera, with more limited resources. But, at this point, Xilinx has the advantage that their competitor is undoubtedly experiencing the usual disruptive chaos that goes along with any major acquisition. Once that chaos dies down, however, the formidable resources of Intel and the fact that the FPGA technology and the fab are essentially under one roof could give Intel/Altera a significant boost. And – there are persistent rumors that Xilinx themselves might be a potential acquisition target, which would throw the whole game into the air once again.
So, grab your popcorn and have a seat. We’ll keep you posted on the all the action.
12 thoughts on “Xilinx vs Intel”
“nothing would be better than a tool suite that could take a legacy C/C++ software application and magically create an optimized, accelerated version that executed on a conventional processor paired with FPGA fabric. Such a tool does not exist, however, nor is it likely to.”
How about a new kind of processor designed to run C in about 1/10 the number of cycles than the conventional processor?
And it is small enough to have several running in parallel, also programmable by loading memories.
Hey Kevin, I was kind of surprised you didn’t throw Xilinx’s SDSoC into the conversation – not that it’s some kind of magic wand/potion, but I’d be interested to hear your thoughts on the tool as it pertains to the C/C++ –> HDL ‘dream’… I seem to vaguely remember a prior piece from you on the subject?
Hi Kevin Morris, Interesting article. Might add that C and C++ are not the only highlevel languages to look at for direct synthesis to FPGA logic for acceleration.
We have had some initial successes using JamaicaVM, our embedded and realtime runtime environment for Java bytecode languages, and transparently integrating CPU execution with FPGA accelerated class methods. At Aicas, we envision this as a next step in moving from secure download of software components to a mix of downloading software and FPGA hardware components dynamically to fielded systems. Using languages that have strong memory models, eliminating pointer arithmetic and indeterminant memory accesses, makes accelerating code with FPGA synthesis tools perhaps a bit more tractable. Any thoughts in this direction?