feature article
Subscribe Now

Weighing UltraScale

Xilinx Announces New Architecture

In the ongoing marketing battle to see who can out-confuse the competition, Xilinx has just fired an impressive salvo. Strapped safely into the cockpit of a superlative-laden press release is an announcement of what the company is calling the “UltraScale” architecture. We would say “new FPGA architecture,” but apparently it isn’t cool to make FPGAs any more. You see, Xilinx is now in the “All Programmable” device business.

Xilinx and archrival Altera have been waging a war of words lately. But, before we whip out the hypesaw and try to slog our way through the formidable layers of marketing bluster and misdirection to find what’s actually cool in this announcement (and hang in there, because there actually is some very high-quality real content buried deep in the core of this fluffball), let’s review the current state of marketing spin in the programmable logic industry:

Xilinx would like you to know that they are “A Generation Ahead,” but Altera says they have a “Measurable Advantage.” Everybody perfectly clear now? Good. If that wasn’t enough info for you to choose the correct device for your next design, you should probably also know that Altera makes FPGAs (including SoC FPGAs) and Xilinx makes All Programmable devices (including Extensible Processing Platforms). What do these all mean? Exactly the same thing, it turns out. It means “FPGAs (including ones with built-in processors).”

That’s great for now, but what about the future? What if we are choosing a device that we will use in a new design that won’t go into production for at least a year or two? In that case, Altera would like you to know that they’ll have some great big capabilities – assuming they can get to production with new foundry partner Intel on their upcoming 14nm Tri-Gate (FinFET) technology. Your performance-price-power meter will tip to the left after the needle slams into the right side of the case with all of this FinFET-found awesomeness. Forget the fact (please, they implore you, please forget!) that both Tabula and Achronix have been doing the Intel/FinFET dance for a while already with their 22nm Tri-Gate programmable logic devices.

On the other hand, Xilinx now wants you to know that they are the first in the industry to tape out at 20nm, and you know what THAT means! Wait, you don’t? You know – “tapeout” – it’s the part of the chip development process where you are “done!” I mean, “done,” except for the huge amount of stuff that has to happen after tapeout and before production. So, anyway, they’ll have some 20nm chips for us all – in something like a few months to a year or so. But, we already knew that. FPGA companies don’t usually announce when they hit tape-out. But, if you need a reason to make an announcement or if your bag of superlatives is looking a little lean, why not? 

The race is on, then. Xilinx has taped out some big, fast chips on TSMC’s 20nm planar CMOS process (of which we will probably see the first samples in Q4 of this year, with real availability sometime in 2014), and Altera is working hard on some bigger, faster chips on Intel’s 14nm FinFET process (which will likely come out a few months later). Sometime in there, Xilinx will come out with their own FinFET devices based on TSMC’s 16nm FinFET process. Altera is therefore skipping the 20nm node with their high-end Stratix family (but going there with their mid-range Arria family), and Xilinx is planning two versions of their upcoming high-end family in fairly rapid succession – first on 20nm planar, then on 16nm FinFET.

Putting all that together, and considering only the raw, process-based effects on the products, Xilinx will probably have next-generation high-end devices available before Altera does, but then Altera may have a mild process advantage soon after that. If you know the timing of your project well enough to divine which of these options looks best for you – well, you’re smarter than we are.

Now let’s whip out that aforementioned hypesaw and cut down through a couple of layers of posturing. There is much more to the chip world than process. Today, the ecosystem and the architecture of a programmable logic device are probably more important than the raw performance we get from the underlying semiconductor process.

Xilinx, it turns out, is betting big on that.

As we have discussed before, Xilinx bit the bullet and did a complete, ground-up rewrite of their aging design tool suite (ISE) to give us Vivado – an impressive achievement in software. Vivado, although still a tad wobbly on its new legs, is without doubt the most sophisticated software platform ever devised for FPGA (oops, we mean “All Programmable” device) design. It brings unprecedented integration, a unified underlying data model (which you probably will only appreciate in terms of fast, smooth access to your design from all parts of the system), and ASIC-grade integrated synthesis, placement, and routing technology that should scale nicely to the mammoth designs we’ll be doing over the next few generations and should deliver significantly better quality-of-results (QoR) than previous-generation tools.

Tools can do only so much without the architectural features to support them, though, and that’s where Xilinx’s “UltraScale” announcement actually gets interesting. You see, designing a new FPGA is an art – an intricate dance between tools and chip design that hopefully yields just the right balance of cost, routability, performance, power consumption, and features. UltraScale is less a “new” architecture than a re-balancing of the resources on the device achieved through iterative design and testing with Vivado.

What does that really mean

In order to tell us what UltraScale really is, Xilinx has to basically air one of the dirty little not-so-secrets of the FPGA business. Brace yourself, here it comes: You can’t use all of the stuff on a high-end FPGA. In fact, on some high-end FPGAs, you may be able to use only 60% or so of the stuff on the chip before you have serious problems – routing congestion, power, timing closure – all of which come down to the issue of interconnect. This is not a Xilinx problem or an Altera problem – it’s an everybody problem.

The thing that drives the die size of a high-end FPGA device is the IO. You guys seem to really want a lot of pins on your FPGAs, and you want a lot of them with multi-gigabit SerDes. That means that the middle of the chip – where all the LUTs and stuff live, has plenty of space for LUTs, multipliers, memories, and other good stuff. The problem is that when you try to use all that stuff, there are often interconnect bottlenecks that get in the way. In simplest terms, Xilinx has attacked that problem by giving us more (and faster) interconnect. By adding more routing resources, we should be able to handle the wide structures that are required for the massive amount of data that these chips are capable of absorbing with today’s super-speed SerDes.

It’s easy to say, “add more routing resources,” but the trick is to figure out what resources to add and where. Putting a new freeway on the West side of town won’t do much for traffic jams on the far East side. Saying, “all you have to do is add more routing” to an FPGA is like saying, “all you have to do is move the bow across the strings and wiggle your fingers on one end” to master the violin. That’s where Vivado comes in. By running huge suites of designs through various trial architectures in Vivado, Xilinx was able to co-optimize the routing resources on the chip to give the best possible results with Vivado and to afford the highest device utilization across a wide range of design styles. If they succeeded (which they claim they did), the result should be much higher utilization, faster designs, and ultimately dramatically more real, usable bandwidth – on the same silicon area with the same process technology.

At the same time, Xilinx has revamped the clocking architecture on the chip, adopting an ASIC-style multi-region clocking scheme. This avoids the problem with high-skew on long clock lines and virtually eliminates the huge amount of a typical clock period that’s lost to skew. They also now support much finer-grained gating of clocks so advanced ASIC-like power conservation schemes can be used. The results should be higher-frequency operation with fewer skew-related timing problems, lower power, and higher overall bandwidth.

A couple of other goodies in the UltraScale announcement are wider multipliers in the DSP blocks, and more (and faster and wider) hard-IP SDRAM ports. These should play nicely with the increased interconnect and improved clocking schemes to deliver a well-rounded high-utilization device capable of handling the massive performance demands we are all planning to place on them.

With the enormous aggregate bandwidth likely to be provided by next-generation SerDes, it would be easy to overwhelm the other resources on the chip. In order to make devices that can actually take advantage of the capabilities of the IO, we need these faster clock frequencies, higher utilization, better DSP blocks, and increased on-chip memory bandwidth. Without these architectural improvements, a lot of the potential of next-generation would be quietly left on the table – because of inability to route to completion or to excessive power consumption.

While it would be easy to dismiss UltraScale as, “they added more routing,” clearly the net effect will be much greater and more useful than that. True, UltraScale is not what one would normally think of as a “new architecture,” as the basic logic cell structure is essentially unchanged and the types of resources on the chip are pretty much the same things we’ve come to expect – LUTs, Memory, DSP blocks, standard IOs, and SerDes transceivers. But, by balancing and tuning these resources specifically for the types of applications customers are planning, the result could be a truly superior device. We will all have to wait and see.

7 thoughts on “Weighing UltraScale”

  1. Xilinx has focused on interconnect with their latest architectural update. Will this affect your ability to use their devices in your application? Have you run into situations where you couldn’t use the full capabilities of an FPGA because of interconnect issues?

  2. All this redefinition rather reminds me of IBM in the 1960s and 1970s where the general feeling was that they had a division for creating new names for existing concepts. And in this light, Altera leaps to generation 10 so Xilinx says we are not playing the numbers game any more. But I do think that that in all this renaming we are missing something, and that is that Zynq is not an FPGA. It is not, as Altera are claiming with their catch-up products, an SoC. But it is something new. (And I hate to find myself defending Xilinx, which is more than capable of defending itself.)

  3. Pingback: DMPK Assays

Leave a Reply

featured blogs
Dec 6, 2023
Optimizing a silicon chip at the system level is crucial in achieving peak performance, efficiency, and system reliability. As Moore's Law faces diminishing returns, simply transitioning to the latest process node no longer guarantees substantial power, performance, or c...
Dec 6, 2023
Explore standards development and functional safety requirements with Jyotika Athavale, IEEE senior member and Senior Director of Silicon Lifecycle Management.The post Q&A With Jyotika Athavale, IEEE Champion, on Advancing Standards Development Worldwide appeared first ...
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines.

Read more

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023