In the ongoing marketing battle to see who can out-confuse the competition, Xilinx has just fired an impressive salvo. Strapped safely into the cockpit of a superlative-laden press release is an announcement of what the company is calling the “UltraScale” architecture. We would say “new FPGA architecture,” but apparently it isn’t cool to make FPGAs any more. You see, Xilinx is now in the “All Programmable” device business.
Xilinx and archrival Altera have been waging a war of words lately. But, before we whip out the hypesaw and try to slog our way through the formidable layers of marketing bluster and misdirection to find what’s actually cool in this announcement (and hang in there, because there actually is some very high-quality real content buried deep in the core of this fluffball), let’s review the current state of marketing spin in the programmable logic industry:
Xilinx would like you to know that they are “A Generation Ahead,” but Altera says they have a “Measurable Advantage.” Everybody perfectly clear now? Good. If that wasn’t enough info for you to choose the correct device for your next design, you should probably also know that Altera makes FPGAs (including SoC FPGAs) and Xilinx makes All Programmable devices (including Extensible Processing Platforms). What do these all mean? Exactly the same thing, it turns out. It means “FPGAs (including ones with built-in processors).”
That’s great for now, but what about the future? What if we are choosing a device that we will use in a new design that won’t go into production for at least a year or two? In that case, Altera would like you to know that they’ll have some great big capabilities – assuming they can get to production with new foundry partner Intel on their upcoming 14nm Tri-Gate (FinFET) technology. Your performance-price-power meter will tip to the left after the needle slams into the right side of the case with all of this FinFET-found awesomeness. Forget the fact (please, they implore you, please forget!) that both Tabula and Achronix have been doing the Intel/FinFET dance for a while already with their 22nm Tri-Gate programmable logic devices.
On the other hand, Xilinx now wants you to know that they are the first in the industry to tape out at 20nm, and you know what THAT means! Wait, you don’t? You know – “tapeout” – it’s the part of the chip development process where you are “done!” I mean, “done,” except for the huge amount of stuff that has to happen after tapeout and before production. So, anyway, they’ll have some 20nm chips for us all – in something like a few months to a year or so. But, we already knew that. FPGA companies don’t usually announce when they hit tape-out. But, if you need a reason to make an announcement or if your bag of superlatives is looking a little lean, why not?
The race is on, then. Xilinx has taped out some big, fast chips on TSMC’s 20nm planar CMOS process (of which we will probably see the first samples in Q4 of this year, with real availability sometime in 2014), and Altera is working hard on some bigger, faster chips on Intel’s 14nm FinFET process (which will likely come out a few months later). Sometime in there, Xilinx will come out with their own FinFET devices based on TSMC’s 16nm FinFET process. Altera is therefore skipping the 20nm node with their high-end Stratix family (but going there with their mid-range Arria family), and Xilinx is planning two versions of their upcoming high-end family in fairly rapid succession – first on 20nm planar, then on 16nm FinFET.
Putting all that together, and considering only the raw, process-based effects on the products, Xilinx will probably have next-generation high-end devices available before Altera does, but then Altera may have a mild process advantage soon after that. If you know the timing of your project well enough to divine which of these options looks best for you – well, you’re smarter than we are.
Now let’s whip out that aforementioned hypesaw and cut down through a couple of layers of posturing. There is much more to the chip world than process. Today, the ecosystem and the architecture of a programmable logic device are probably more important than the raw performance we get from the underlying semiconductor process.
Xilinx, it turns out, is betting big on that.
As we have discussed before, Xilinx bit the bullet and did a complete, ground-up rewrite of their aging design tool suite (ISE) to give us Vivado – an impressive achievement in software. Vivado, although still a tad wobbly on its new legs, is without doubt the most sophisticated software platform ever devised for FPGA (oops, we mean “All Programmable” device) design. It brings unprecedented integration, a unified underlying data model (which you probably will only appreciate in terms of fast, smooth access to your design from all parts of the system), and ASIC-grade integrated synthesis, placement, and routing technology that should scale nicely to the mammoth designs we’ll be doing over the next few generations and should deliver significantly better quality-of-results (QoR) than previous-generation tools.
Tools can do only so much without the architectural features to support them, though, and that’s where Xilinx’s “UltraScale” announcement actually gets interesting. You see, designing a new FPGA is an art – an intricate dance between tools and chip design that hopefully yields just the right balance of cost, routability, performance, power consumption, and features. UltraScale is less a “new” architecture than a re-balancing of the resources on the device achieved through iterative design and testing with Vivado.
What does that really mean
In order to tell us what UltraScale really is, Xilinx has to basically air one of the dirty little not-so-secrets of the FPGA business. Brace yourself, here it comes: You can’t use all of the stuff on a high-end FPGA. In fact, on some high-end FPGAs, you may be able to use only 60% or so of the stuff on the chip before you have serious problems – routing congestion, power, timing closure – all of which come down to the issue of interconnect. This is not a Xilinx problem or an Altera problem – it’s an everybody problem.
The thing that drives the die size of a high-end FPGA device is the IO. You guys seem to really want a lot of pins on your FPGAs, and you want a lot of them with multi-gigabit SerDes. That means that the middle of the chip – where all the LUTs and stuff live, has plenty of space for LUTs, multipliers, memories, and other good stuff. The problem is that when you try to use all that stuff, there are often interconnect bottlenecks that get in the way. In simplest terms, Xilinx has attacked that problem by giving us more (and faster) interconnect. By adding more routing resources, we should be able to handle the wide structures that are required for the massive amount of data that these chips are capable of absorbing with today’s super-speed SerDes.
It’s easy to say, “add more routing resources,” but the trick is to figure out what resources to add and where. Putting a new freeway on the West side of town won’t do much for traffic jams on the far East side. Saying, “all you have to do is add more routing” to an FPGA is like saying, “all you have to do is move the bow across the strings and wiggle your fingers on one end” to master the violin. That’s where Vivado comes in. By running huge suites of designs through various trial architectures in Vivado, Xilinx was able to co-optimize the routing resources on the chip to give the best possible results with Vivado and to afford the highest device utilization across a wide range of design styles. If they succeeded (which they claim they did), the result should be much higher utilization, faster designs, and ultimately dramatically more real, usable bandwidth – on the same silicon area with the same process technology.
At the same time, Xilinx has revamped the clocking architecture on the chip, adopting an ASIC-style multi-region clocking scheme. This avoids the problem with high-skew on long clock lines and virtually eliminates the huge amount of a typical clock period that’s lost to skew. They also now support much finer-grained gating of clocks so advanced ASIC-like power conservation schemes can be used. The results should be higher-frequency operation with fewer skew-related timing problems, lower power, and higher overall bandwidth.
A couple of other goodies in the UltraScale announcement are wider multipliers in the DSP blocks, and more (and faster and wider) hard-IP SDRAM ports. These should play nicely with the increased interconnect and improved clocking schemes to deliver a well-rounded high-utilization device capable of handling the massive performance demands we are all planning to place on them.
With the enormous aggregate bandwidth likely to be provided by next-generation SerDes, it would be easy to overwhelm the other resources on the chip. In order to make devices that can actually take advantage of the capabilities of the IO, we need these faster clock frequencies, higher utilization, better DSP blocks, and increased on-chip memory bandwidth. Without these architectural improvements, a lot of the potential of next-generation would be quietly left on the table – because of inability to route to completion or to excessive power consumption.
While it would be easy to dismiss UltraScale as, “they added more routing,” clearly the net effect will be much greater and more useful than that. True, UltraScale is not what one would normally think of as a “new architecture,” as the basic logic cell structure is essentially unchanged and the types of resources on the chip are pretty much the same things we’ve come to expect – LUTs, Memory, DSP blocks, standard IOs, and SerDes transceivers. But, by balancing and tuning these resources specifically for the types of applications customers are planning, the result could be a truly superior device. We will all have to wait and see.