When the #1 FPGA company makes what is arguably their biggest new-technology announcement in a decade, you’d expect there to be a lot of substance. With this week’s announcement of UltraScale+ Virtex, Kintex, and Zynq devices planned to roll out on TSMC’s 16nm FinFET process, the company did not disappoint. This is one of the broadest, most complex announcements we have ever heard from Xilinx. So, with that preface, let’s take a look at what those folks on the south side of San Jose have been up to lately.
In summary, Xilinx is announcing new Virtex, Kintex, and Zynq families of programmable devices with major improvements in capability over previous generations.
Xilinx is unveiling its “UltraScale+” device families. Note the “+”. That means that these families are based on TSMC’s 16nm FinFET process – rather than the 20nm planar process that the current “UltraScale” devices use. And, since we’re on the topic of underlying fabrication process and FinFETs, let’s get that part of the discussion out of the way first. No, this announcement does not answer the mostly irrelevant but marketing-wise omnipresent question of whether Xilinx or Altera will ship the first FinFET-based FPGAs, with Xilinx working with TSMC and Altera jumping over to Intel’s 14nm Tri-Gate (Intel’s name for FinFETs) process. Actually, Xilinx and Altera are now racing each other for second place in that derby right now, because Achronix already crossed the finish line on that one with their Intel-fabbed 22nm “Speedster” FinFET FPGAs.
FinFETs offer a step function in performance/power for FPGAs. This is beyond the normal single-node advantage we’d expect from a typical Moore’s Law shrink. FinFETs can do more work with less power, and leak less, than similar-sized planar transistors. The result is that the families Xilinx is announcing will have more than a typical one-node improvement in performance, power, and density – based on the new process alone. That’s a big deal.
But everybody will eventually have FinFETs, so any competitive advantage from process is a matter of timing. Yes, those of us using the next generation of chips will reap massive benefits from using devices made with the latest process. You’ll do more, with a cheaper chip, and it will take less power.
Now, let’s move on to the stuff in this announcement that is much more interesting. Xilinx isn’t just sitting back and re-doing the same FPGAs (only bigger), taking advantage of the next process node. That would give us a normal, Moore’s Law improvement in FPGA technology. The company has done some significant innovation in the areas of architecture, packaging, tools, and IP that give us considerably more boost than Moore’s Law alone.
One of the biggest advances Xilinx has made is not mentioned in their latest release, but it has its footprints all over it. That is Vivado, the company’s completely overhauled design tool suite. A few years ago, the company made a major investment in a total rewrite of their aging ISE tools. The result was a state-of-the-art comprehensive EDA tool with all the latest bells and whistles in terms of data model, integration, algorithms, and performance. Now Vivado has had a few years to mature and get into fighting trim, and it played a major role in the development of the architecture for the latest devices. Xilinx used Vivado to completely redesign the routing resources on the chip – eliminating bottlenecks and giving the tools exactly what they needed for more demanding designs. As a result, the UltraScale and UltraScale+ families are significantly better than their predecessors in terms of overall routability and utilization. If you’re accustomed to sizing your FPGA based on a 60-70% utilization, you’ll be pleasantly surprised with the 90%+ results many teams are finding with these newly re-architected devices.
What ARE included in the recent announcement are several important architectural improvements that capitalize on the above process, routability, and utilization advantages. These include UltraRAM – a new block that delivers significantly more on-chip memory capacity, SmartConnect – a new interconnect optimization technology, and “Heterogeneous Multi-processing” in the new Zynq devices – which is basically expanding the ARM-based processing system with a lot of optimized hard IP. The company is also touting “3D-on-3D” which is a marketing way of pointing out that they are using both 3D transistors (FinFETs) and 3D packaging technology (silicon interposers and TSVs). All of these improvements are significant, and all of them work together to bring us devices with dramatically more capability than we have ever seen before in programmable logic.
Let’s take a look at them one by one.
In many applications, high-performance memory is at a premium. You need memory for buffering, for shared resources between processors and accelerators, and for various types of caching. Often, it pays to have the memory located where it is needed, rather than at the other end of a busy multi-purpose bus or switch fabric. While FPGAs have always had some memory resources, designers have had to go off chip to get access to large amounts of storage. But off-chip memory interfaces are expensive and power hungry, and they introduce a considerable amount of latency. They also chew up valuable IO on your FPGA, and sometimes IO is the scarcest resource of all.
To address this issue, Xilinx has created what they call “UltraRAM” – high-performance memory blocks strategically located where they are likely to do the most good. Some device configurations have as much as 432 Mb of UltraRAM, significantly more memory than has been available in previous generations. UltraRAM doesn’t replace the existing block RAM and LUT-based memory resources. Different types and sizes of on-chip memory are useful for different purposes, and UltraRAM just augments the existing lineup with a new, much larger, high-performance memory block. Considering the size and complexity of the target applications for these devices, UltraRAM will likely come in extremely handy in many designs, and it is likely to improve the system performance, reduce latency, reduce power consumption, and significantly reduce BOM cost and board complexity when compared with external memory.
Next up is what Xilinx calls “SmartConnect” interconnect technology. Like UltraRAM, SmartConnect is a feature born of the demands of the much larger designs being implemented on these new devices. While the overhaul of the routing resources we described above dramatically improves the detailed routing part of your design, there is a new meta-level of interconnect that needs to be addressed as well. When your chip has large, complex blocks that talk to each other over prescribed interfaces, you typically have specific latency and/or throughput targets for those pipes. That means a “one priority fits all” interconnect strategy is guaranteed to be sub-optimal for at least part of your design. SmartConnect allows distinct optimization of interconnect to meet specific goals of each part of a design, reducing the overall routing resource required and matching the type of interconnect chosen to the specific constraints of each interface.
The Virtex and Kintex FPGA families take advantage of all this new stuff in just the way you’d expect. Interestingly, almost all of the Virtex devices will be fabricated with multi-die 3D silicon-interposer technology. The largest Virtex weighs in with a hefty 3.4 million 4-input LUT equivalent logic cells, 432 Mb UltraRAM, 94.5 Mb block RAM, and 46.4 Mb distributed RAM. It packs a whopping 11,904 DSP slices, four hardened PCIe® Gen3 x16 / Gen4 x8 interfaces, twelve 150G Interlaken interfaces, and eight 100G Ethernet MACs w/ RS-FEC. This is topped off with 832 single-ended IOs and impressive 128 GTY 32.75Gb/s SerDes transceivers.
Clearly, this perfect storm of process scaling, 3D transistor improvement, architectural improvement, packaging technology advances, and design tool progress will give us the largest single leap forward in FPGA technology and capability we’ve ever seen.
Probably the most major advance in this announcement, however, is the new UltraScale+ Zynq offering. Zynq got such a massive upgrade, it almost needs a new name. The current Zynq is a wonderful example of what we call an “HIPP” (Heterogeneous Integrated Processing Platform). It combines a dual-core ARM Cortex-A9 based processing subsystem with copious amounts of FPGA fabric and IO. This combination allows designs to take advantage of hardware acceleration of demanding algorithms along with conventional high-performance applications processing. The result is a highly power-efficient device with formidable processing capability.
With Zynq UltraScale+, the FPGA portion of that equation benefits from all of the enhancements we described above. But the ARM-based subsystem gets a major upgrade as well. The applications processors are now quad-core, 64-bit, ARM Cortex-A53s – packing significantly more MIPS than the old A9s. Then, for the real-time bits of your application, they dropped in dual-core Cortex-R5 real-time processors. Rounding out the passel-o-processors is a Mali-400MP graphics processor. Taken together, that’s a LOT more processing oomph than before, and the addition of the real-time engines and GPU means that you can tailor the type of processing better to the part of your application that needs it.
One of the big application areas for this new Zynq family is video, and Xilinx acknowledged that with the addition of a hardened H.265/264 codec unit. An “Advanced Dynamic Power Management Unit” brings some ASIC/ASSP-grade application power management to a programmable device. There is also a new configuration security unit to help lock down your design, and forward-looking DDR4/LPDDR4 memory interface support – which will be important for the high-performance designs likely to land in the new Zynq’s lap.
Taken together, we feel that the Zynq upgrades are the most significant of the bunch. Of course, the UltraScale+ Virtex and Kintex families are both taking huge leaps forward, but Zynq (with all its new hardened ARM-based IP) seems like a whole new animal. In the hands of capable design teams, Zynq will enable applications that might otherwise be impossible. It should deliver an immense amount of aggregate heterogeneous processing capability at an unmatched performance-per-watt efficiency.
Of course, we will all have to wait awhile before we get to play with these amazing devices. Xilinx plans initial samples late this year with volume production ramping in 2016. But tool early-access support starts much sooner than that (Q2 2015). So, if you’re one of the lucky ones in the early access program, you’ll be able to take these families for at least a virtual test-drive pretty soon.