In a poker game, nobody wants to show his cards first. And with the ever-engaging Altera versus Xilinx high-stakes marketing match, it’s always riveting to see who will decide to be the first to disclose the details of their next-generation programmable logic family, and how and when the other will choose to respond.
The tensions have never been higher than in the current contest. Both companies are throwing incredible energy into the race to build the best-and-first programmable logic chips based on 14/16nm FinFET technology. This generation of devices promises to be the most exciting and challenging in history, with absolutely mind-blowing capabilities on the table, and with the two companies using different fabs and different technologies to achieve their goals.
Xilinx blinked first, in this case, giving up the goods on their upcoming Virtex and Zynq devices just a few weeks ago. Xilinx’s effort is impressive. Now, Altera has followed suit with an equally inspired rollout of their own: behold – Stratix 10!
As expected, Stratix 10 brings all the kinds of benefits we had anticipated: more capacity (whoa! seriously more capacity, with a monolithic device weighing in at a stunning 5.5 million LUTs), more speed, lower power consumption, more and faster connectivity, and bigger on-chip memories. Those are the kinds of things everybody knew we’d get from the next step down the Moore’s Law trail – with an extra boost from the new 3D FinFET transistor technology (OK – to be accurate, these folks call them “Tri-Gate”). FinFET takes the gate vertical, with metal “fins” that allow higher density, faster switching, and (most importantly) lower power consumption. Combined with the shrink we already get due to the smaller geometry, this FinFET-powered process node step gives us more and better benefits on PPP (price, performance, and power) than we have had in the last several generations.
What we didn’t expect was the radical architectural changes Altera also brought to the table. Headlining the announcement, and amplifying the already-impressive exponential Moore’s Law gains, is an architectural change the company calls “HyperFlex.” For the past couple of decades, FPGAs have been implemented with pretty much the same basic architecture for LUTs and interconnect. The biggest change we’ve seen was the widening of the logic cells a few years back – with both Altera and Xilinx jumping from the venerable LUT4 to today’s wider cells. Now, however, Altera has rolled out HyperFlex, a radical change to the architecture – in the form of vast numbers of tiny registers (called “Hyper-Registers,” of course) embedded in the routing fabric itself. It’s a simple and elegant architectural change with dramatic effects.
What do those little latches accomplish? Here’s the deal.
In the old days, most of the delay in FPGA logic was in the LUTs themselves. We could get a pretty good idea of the total delay between registers simply from the number of levels of combinational logic we had tried to stack in between clock edges. As devices got bigger and more complex, however, more and more of the delay was in the interconnect itself. This made timing calculations more difficult and delays more variable. Today, almost all of the delay through a logic path is due to interconnect. And, with the long routes required by these newer larger chips, the challenge of timing closure has become a major barrier. We really don’t know where the problems are in our circuit until after place-and-route – and then it’s too late to easily do anything about it.
That’s where these handy little routing-based registers come into play. If the tools detect that you have negative slack in a timing path, they can adjust the location of the registers to anywhere in the route, effectively re-timing the logic. This allows timing slack to effectively be moved from one clock cycle to the next – bartering and borrowing slack along a chain so that the tools can close timing deterministically without the seemingly endless cycle of open-loop iteration normally required.
Altera says that the net effect of this (when combined with the process improvements in Stratix 10) is the doubling of the effective clock speeds that can be achieved. That’s an impressive feat, as it’s been a very long time since a new family gave us a 2x performance boost. But doubling the speed is only the beginning. By enabling deterministic timing closure, Altera has most likely significantly reduced the design cycle as well as making it more predictable. If you don’t have to allow schedule for some arbitrary and unknown number of design iterations to close timing, you can give your boss a much more accurate idea when your design will be ready to roll.
Altera didn’t stop with simply re-timing logic paths. The new “Spectra-Q” algorithms can use this capability to give you suggestions for things it can do that will improve the quality of your RTL, resulting in even more performance boost. The marketing folks may have gotten a little carried away with this “Hyper-Aware” hyperbole, because, in addition to Hyper-Retiming, we also get Hyper-Pipelining and Hyper-Optimization.
There are also some great knock-on effects of this frequency boost. Since you can run at higher clock frequencies, you don’t have to go as wide with your data paths. Less parallelism means less logic utilization, increasing the effective density of your device. Less utilization also means less power (particularly leakage current), although you give back some of the power, owing to the higher frequency switching. Still, the power consumption is a net improvement – Altera is claiming in the range of 40%-70% reduction in power consumption.
Second up in the “more unexpected” category is the use of heterogeneous “3D” interposer-based SiP packaging. The biggest Stratix 10 devices are still monolithic, unlike those of Xilinx, which use interposer-based packaging to build a larger FPGA out of smaller tiles. Altera’s SiP technology is being initially used for high-speed serial IOs, and that solves a major bottleneck in the process of designing a new device. Today, there are so many legacy, existing, and emerging standards for multi-gigabit transceivers that it’s impossible to put a stake in the ground and settle in on the perfect combination of transceivers that solves all of the world’s problems. So, the company has decided to use separate “connectivity tiles” connected on an interposer to future-proof the devices. Stratix 10 uses Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology to connect various transceiver banks to the core fabric. In the initial round, there will be PCIe Gen3 – 144 transceivers operating at up to 30Gbps. Future tile variants include Ethernet, PCIe Gen 4, 56G, PAM-4, Optical, and other transceiver technologies to be named later. We can’t help noticing that EMIB might also make a handy way to connect x86 processors to the fabric. Just sayin’.
The final major area of change in Stratix 10 is related to security, as Altera has packed a plethora of new security technology into the new devices. Starting at the top, you can divide your design into multiple hierarchical security “sectors” or islands. Each sector can have its own multi-factor authentication and encryption. Physically Unclonable Functions (PUFs) that rely on process variation to produce unique keys are used to assure that the device is who it says it is. This allows you to build your own customizable, multi-layered security implementation – an acknowledgement that SoCs today really are “systems” on chip, and systems have more complex needs than a single area and level of protection.
By now, everyone may be asking why we haven’t mentioned the very large, gray, trunk-laden animal in the middle of the room. Yes, it has now been announced that Intel plans to buy Altera. However, that has not affected Altera’s plans to deliver devices with built-in ARM processors. Stratix 10 will have SoC variants that include quad-core ARM Cortex-A53 processors (yes, those will be ARM processors manufactured by Intel. Get over it.) Interestingly, Xilinx has not announced plans for high-end devices with built-in ARM processors, so for now Altera has the only announced high-end SoC FPGA with integrated ARM processors.
Combined with the previously announced Spectra-Q technology, Stratix 10 promises to be a major upgrade for designers of systems based on high-end FPGAs. This should be, in fact, the largest leap forward in FPGA technology we’ve seen in at least a decade. It remains to be seen whether Xilinx or Altera will deliver first (and it doesn’t really matter all that much, to be honest). And it remains to be seen exactly how the Intel deal will affect the ultimate delivery and deployment of these devices – but the specs and details Altera has unveiled this week are truly impressive, and they should enable a whole new generation of systems to achieve performance and functionality that has never before been possible.
12 thoughts on “Altera Stratix 10”
“Less parallelism means less logic utilization”- should be higher logic utilization I think…
What I mean there is that we use less of the logic resources on the chip because we don’t have to build larger, more parallel structures. That frees up more resources to do other things, of course, so I see the confusion in the terminology.
“Interestingly, Xilinx has not announced plans for high-end devices with built-in ARM processors, so for now Altera has the only announced high-end SoC FPGA with integrated ARM processors.” – What does this even mean? The Xilinx Zynq UltraScale+ MPSoC has Quad Core ARM Cortex-A53s along with Dual Core ARM Cortex-R5s.
Did you mean to write something about Intel maybe?
@WEATHERBEE, Yes, Zynq UltraScale+ has ARM processors. Virtex UltraScale+ does not. So Xilinx’s high-end FPGAs do not have integrated ARM processors.
Zynq tops out at around 900K LUT equivalents and 24 SerDes transceivers – so not a “high-end FPGA” comparable to Virtex UltraScale+ or Stratix 10 families (both of which are made up of significantly larger devices.)