The two big FPGA companies want to be sure that you know they’re ahead.
They always have. It isn’t because you really needed to know, or because one or the other of them being ahead at any given time had any long-term industry-shaping ramifications. It’s just that this myopic, tit-for-tat, red vs blue, Hatfield and McCoy, be-the-first-to-blink behavior is, according to recent economic research, the optimal solution for members of a symmetric pre-emptive duopoly.
Or, maybe both sides just really hate those other guys.
A few weeks ago, Altera announced their vision for FPGA technology on the upcoming 20nm node. Now, it’s Xilinx’s turn. Does this mean that Altera is 2 months ahead of Xilinx in the all-important “next process node”?
Nope. It means that, in the high-thrill game of “chicken” that ensues at the early stages of any new process generation, Altera decided it was in their best interest to announce 20nm stuff first. Of course, both companies have been working on the next generation for years already, and neither company will be ready to ship any real-volume products on it for at least another year, so the timing of the announcement is purely in the capable hands of the marketing strategists.
Now, however, since both companies have shared their vision of the 20nm future with us, we can compare, contrast, and speculate. We’re good at that. There’s no accountability.
First, let’s look at what things are the same. The biggest of these is a thing that’s never been the same before – the process itself. As a bit of background – throughout modern FPGA history, Xilinx and Altera have always used different processes. For years, they even used different fabs. At 28nm, however, with Xilinx’s move to TSMC, both companies were using the same fab. Never fear, though, because the two rivals chose different process variations from the TSMC lineup. The difference in variations gave each side plenty of ammo to claim that their devices were built on a fundamentally superior foundation. Bazillions of press releases later, we’d all been confused into submission. We just didn’t care whose process was better. We wanted FPGAs that worked, tools that would make us productive, and production volumes that would let us ship our own products.
Now, however, at 20nm, it appears that TSMC will offer only one process variant – and that both Xilinx and Altera will be using it. That means that both companies will have to find new ways to claim differentiation and superiority – an epic marketing task, which is already well underway. Since all the 20nm chips will be built on the same fab lines with the same process, things like architecture, packaging, tools, IP, and service will take center stage. Both companies’ recent announcements reflect that new reality. In fact, the only process differentiation we are likely to see in the FPGA space is between Xilinx and Altera (both using 20nm TSMC) and “alternative” FPGA companies like Tabula and Achronix (using Intel’s 22nm FinFET-like process).
Xilinx went out on a limb with several major innovations at 28nm, and their experience on those risks may well pay off as they move to 20nm. The three most visible of these were: the first production 2.5D (also called “3D”) interposer-based homogeneous and heterogeneous FPGAs, the first devices with hybrid high-performance ARM-based processing subsystems combined with FPGA fabric, and a complete ground-up redesign of the development tool suite, replacing the aging ISE tools with the new Vivado suite.
While the rest of the industry claimed that 2.5D fabrication and packaging techniques (including things like microbumps and through-silicon-vias), were not feasible for production use, Xilinx bravely charged ahead and produced devices based on those technologies. This accomplished several things. First, it helped jump-start the ecosystem for producing these kinds of devices industry-wide. Second, it gave Xilinx a great deal of experience with 2.5D design and production that should come in very handy at 20nm. Finally, it gave Xilinx both an unmatched huge FPGA (around 2 million LUT4 equivalents) and a fancy heterogeneous device (with a mixture of processes for the SerDes and FPGA fabric portions). With 20nm, Xilinx claims that they’ll be using some of that experience to deliver 5x more die-to-die interconnect through the interposer. That could bring a whole different level of 2.5D possibilities online.
Also, at 28nm, Xilinx popped out Zynq – the first of a new generation of SoCs that mix high-performance multi-core ARM-based processing subsystems with FPGA fabric. While Altera has announced plans for similar devices, Xilinx’s are already shipping. This gives Xilinx the opportunity to apply what they’ve learned with the 28nm Zynq program to their upcoming 20nm families. It is important to note that the success of these devices will depend a lot more on the ecosystem and (ironically) the marketing than on the devices themselves. Since SoCs-with-FPGA-fabric like this are going after a brand-new-to-FPGA-companies audience, they will win or lose, not against the other FPGA competitor, but against entrenched SoC suppliers like Freescale, TI, NXP, and others. That’s a whole different ballgame than just one-upping the usual rival.
Finally, we have the total overhaul of the Xilinx tool suite. This, we believe, was a mandatory leap for Xilinx for two reasons. First, the aging Xilinx ISE had been perceived as weaker than Altera’s Quartus tools for several years. Being slightly behind the curve in tools was not a good position for the largest FPGA company – particularly with the importance of tool capability rising with each new generation of products. Second, Moore’s Law was overwhelming ISE, and it is doubtful that the old tools could have stood up to the challenges of capacity, performance, quality-of-results, and usability posed by the 28nm and upcoming 20nm process nodes. No matter how difficult, expensive, and risky the transition might be – Xilinx had to have new tools.
Now, with Vivado, the company has taken that leap and is flying down the hill in their marketing wingsuit – hoping not to collide with terrain. If all works out, they will have had the thrill-ride of a lifetime and they’ll have a substantial advantage in their tool framework for years to come. If it doesn’t, well, gravity can be a tough mistress. Xilinx claims that they are deploying major improvements in Vivado’s capabilities, including 20% better LUT utilization and up to 3 speed grades better performance – and that’s just from the tools. Altogether, Xilinx claims that they are working toward a 4x “productivity boost” for designers using their tools.
As one might expect, both Xilinx and Altera are telling us that densities will rise significantly (again) at 20nm. We expect this means that device capacities will approximately double once again (Xilinx claims 1.5x-2x “greater integration”). Both companies are telling us that performance of high-speed multi-gigabit transceivers (SerDes) will take another giant step – with Xilinx hinting that we’ll see devices with over 100 transceivers operating at 33 Gbps. The mind-boggling throughput of such devices will help alleviate the bandwidth glut currently felt by everyone trying to live the vision of the constantly-connected lifestyle with a less-than-adequate infrastructure to support them.
Both companies agree that heterogeneous 2.5D technology will enable a fascinating new category of SoC devices that combine the benefits of various process optimizations and technologies on a single device. Picture slices with memories, analog, SerDes, FPGA fabric, and processing subsystems – all potentially fabricated on different technologies – all integrated on a single silicon interposer and wrapped up in one big’ol package. This new breed of SoC will enable products we haven’t yet even imagined, with tremendous advantages not only in integration, but also in capability based on dramatically higher bandwidth between major system components, and in dramatically lower system power consumption.
Both companies are also telling us that Zynq-like devices will make up a major part of their strategy at 20nm. There are so many potential applications for this FPGA+CPU technology that it’s hard for the companies to know where to start. The acceleration and power-saving potential of the FPGA fabric combined with the flexibility and speed of the ARM-based subsystem – both married to the staggering connectivity capabilities these devices have – is intoxicating when one starts to imagine the possibilities. One quickly sobers up, however, when it comes to the problem of how to program them. Here, Xilinx is driving in the direction of compiling C modules targeted for acceleration with the company’s AutoESL technology, and Altera (as we recently explored) is betting on OpenCL bleeding over from the high-performance-computing crowd currently targeting GPUs.
While power consumption has often been predicted as a potential achilles heel of moving forward on the Moore’s Law curve, Xilinx is claiming that they will cut power in half once again at the 20nm node. If true, this is a shocking achievement. With each shrink, the leakage current problem gets harder to solve and the proportion of leakage versus dynamic power goes up. Somehow, though, FPGA companies keep pulling tricks out of their collective sleeves to keep up with the treadmill, and the power-per-gate-per-frequency continues to drop. This is a good thing, because the number of gates and the frequency keep increasing, so one has to improve at a good pace just to stay even on overall power in the package.
For now, though, we’re all just eating popcorn and watching the marketing show. This is like going to the theater and sitting through 2 hours of nothing but “previews of coming attractions” for movies that will be out in a year or so. It’s fun to get all excited about the possibilities, but the real proof will be seen when these devices hit the market. Until then, don’t we all have some engineering work to do?
2 thoughts on “The Future is Clear (ish)”
Xilinx gave a preview of their upcoming 20nm technology. What do you think about the evolution of FPGAs on today’s advanced processes?
The problem of how to program them is sobering indeed, C-to-gates or OpenCL notwithstanding. Design productivity has been growing far slower than Moore’s Law’s 2x/2yr rate for decades.
Look at the cost to develop and debug a big FPGA design and divide by the product’s lifetime volume. As productivity falls further behind Moore’s Law, per-chip design cost takes a bigger share of product cost.