A Bonus Generation

The FPGA world has a unique obsession with semiconductor process nodes. Every two years or so we witness an epic battle between the two major market-share holders, centered mostly around who gets their devices working first on the next new semiconductor process. Historically, the stakes were very high. With FPGAs being among the first devices to go to production on a new node, and with the high-margin spoils of victory going largely to the winner – the biennial financial fates of the two big FPGA companies rode heavily on winning the next-generation derby.

Now, Xilinx is starting to ship samples of their new UltraScale family, based on TSMC’s 20nm planar CMOS process. This kind of “first to ship” announcement is usually a sign of impending victory, as the first to sample is usually the first to ship in volume and the first to collect the bulk of eager early adopters just chomping at the bit to design the biggest baddest silicon into their next system. Communications infrastructure has always been the largest segment of the FPGA market, and most of the big players in that segment have used devices from both Xilinx and Altera, so those big-budget projects are ripe for the picking any time a new leap in performance, density, and features becomes available.

In this case, things are looking pretty rosy for Xilinx, but the picture has gotten a lot more complicated, and it now involves a lot more than just who got to the next node first. Yes, there is certainly still big drama on the process front. In fact, you could argue that there is perhaps now more intrigue than ever. In the previous node, where both companies built product families based on TSMC’s 28nm process, Xilinx clearly came out on top, getting to market first and rolling in with a pile of innovations including Vivado – a brand-new, ground-up re-write of the company’s entire tool suite with ASIC-class capabilities – and some enormous interposer-based FPGAs aimed at the prototyping segment. Altera struggled with 28nm, shipped later than their competitor, and took a sizable financial and market-share hit as a result.

For the next generation, Altera pulled an unexpected play out of the hat – making a deal with Intel to build their next family on Intel’s upcoming 14nm FinFET technology. In theory, this could be a big boost for Altera. Xilinx has stuck with TSMC and has continued on the track to deliver their new UltraScale family on TSMC’s 20nm planar technology, and then to move to FinFETs with TSMC’s upcoming 16nm process. Altera, therefore, skipped the 20nm node with their flagship Stratix family, so this current UltraScale play by Xilinx will go unanswered at the high end – for a few months at least.

Did we mention that the game has gotten more complicated? As intriguing as this perpetual process geometry chess match has become, it is probably high time for it to share the stage with other, equally important factors. Winning in FPGAs today takes a lot more than just fancy chips on the latest semiconductor process. Just ask the numerous failed startups who came to market with impressive devices, only to fail because they were lacking the tools, the IP, and, most importantly, the armies of expert AEs deployed by the big two companies to help customers turn those cool chips into cool products.

In our own repeated surveys of the FPGA market, the number one factor in choosing an FPGA company for a particular design project is “Previous success with vendor’s tools and devices.” In other words, more than the fastest SerDes, the biggest IO counts, the fanciest DSP blocks, the lowest power, or the largest LUT arrays, FPGA designers care about their own confidence in just getting the darn thing to work. If we know a tool suite – including the bugs and their workarounds, have experience with all the quirks of a particular vendor’s chips, and have successfully dropped them into a socket on our boards, we are most likely to go with the same vendor again. It takes a pretty huge advantage in chip capabilities to get us to consider jumping ship and joining the other team.

Xilinx obviously understands this, and the UltraScale announcement shows that the company is taking the breadth of the fight seriously. Vivado now has a full generation of use under its belt, and it no longer walks on the wobbly legs of a few million lines of brand-new code. It brings the kind of performance and capability we will need if we plan to take advantage of the serious capabilities a family like UltraScale brings to the party. While FPGA tools have long given lip service to faster compile times, sophisticated timing optimization, power optimization, and IP management, Xilinx has really nailed an industrial-strength solution with Vivado. It’s a good thing, too, because the aging ISE suite would not at all be up to the challenge of the over-four-million-LUT monster at the high end of the new UltraScale family.

UltraScale is interesting in that it is the first family that Xilinx has had the opportunity to design using Vivado. The company ran exhaustive architectural explorations – experimenting with different configurations of routing resources over a wide range of design styles – to come up with an architecture that would allow very high utilization in the vast majority of cases. For the past several generations, FPGA densities on data sheets have come along with a bit of a nod and a wink. We all knew that no matter what the sheet said, we couldn’t get close to 100% utilization in any real-world design. We had to upsize our chip choice to be sure that we not only had enough LUTs, but that we’d have the routing resources to be able to successfully place and route our design and meet our timing constraints. With UltraScale’s better optimized architecture, that means that we can drop this fudge factor, or at least make it significantly less pessimistic.

The UltraScale families themselves boast some impressive numbers. The largest device in the upcoming Virtex UltraScale family – the VU440 – boasts an incredible 4.4 million “logic cells” (4-input LUT equivalents). Aimed at the ASIC prototyping market, the device has 1,456 user IOs, 48 16.3 Gb/s transceivers, and 89 Mbits of block RAM. The company estimates that the device can implement the equivalent of 50 million ASIC gates. For people designing big prototyping boards or emulators, that kind of capacity is a really big deal. Other, smaller members of the family are aimed at more conventional markets, and they include even higher-performance SerDes – with 28Gb/s backplane-capable transceivers and up to 33Gb/s chip-to-chip/chip-to-optics transceivers. Of course, the devices include a very rich set of hard IP, including PCIe Gen3, 100 Gb/s Ethernet MAC, 150 Gb/s Interlaken, and DDR4 memory interfaces. All that IO capability makes the devices ripe for implementations of a number of single-chip 400 gig applications.

It is a certainty that many such applications will come Xilinx’s way, as UltraScale will run without competition for some significant amount of time. Xilinx still has a lot of challenging work ahead of them, of course. Delivering the first samples of the first devices is far from having all members of a new family shipping in volume. But delivering those first samples is a benchmark that competitors still will not likely reach for quite some time, and, until then, Xilinx pretty much owns the playground.