feature article
Subscribe Now

Intel FPGA Hits its Stride

58 Gbps SerDes and HBM2

Any time a company goes through a major acquisition, there is a period of slowdown and uncertainty. Organizational and cultural norms are stirred into a boiling cauldron of corporate chaos, org charts are pruned and rebuilt, goals are reset, and inevitably projects and products are delayed. In the worst cases, entire product lines and customer bases can be lost. In the best cases, things are a bit shaky and slow for awhile before the newly integrated organization hits its stride.

The Intel acquisition of Altera seems to be hitting its stride.

Over the backdrop of the enormous distraction of the $16B merger, the organization formerly known as Altera understandably slowed a bit on its execution. In the fast-paced rivalry between Altera and Xilinx, the latter even adopted “A Generation Ahead” as a marketing slogan – referring mostly to the delay Altera and Intel had experienced in launching their first FinFET FPGAs on the 14nm process node. And, while there have always appeared to be some solid long-term synergies between Altera and Intel technology, the primary visible effect of the merger early on was predictable delays in getting new stuff out the door.

Now, however, Intel has struck back with some critical firsts that pull the rug out from under that “Generation Ahead” rhetoric – at exactly the time that Xilinx is going through serious organizational chaos and political infighting of their own. In the past few weeks, Intel has announced first shipments of two high-value technologies – FPGAs with HBM2 high-bandwidth memory in the package, and FPGAs with the first 58 Gbps PAM4 SerDes transceivers. Both of these are important advances, and both reflect multiple advantages Altera gains from being part of Intel.

In both cases, the former Altera team (now Intel’s programmable solutions group, or PSG) utilizes Intel’s embedded die interconnect bridge (EMIB) technology to combine chiplets fabricated with different technologies – and even at different fabs – into a single package. In the case of HBM, EMIB is used to connect memory stacks from SK Hynix and Samsung to Intel’s Stratix 10 FPGAs. In the case of the more recently announced 58Gbps PAM4 SerDes, EMIB is used to connect transceivers fabricated with TSMC to Intel’s Stratix 10 FPGAs fabricated on an Intel 14nm FinFET process. In some cases, EMIB is used to combine all three.

We have always believed that this type of heterogeneous multi-die packaging represented the most significant potential advantage of 2.5D packaging technology (versus homogeneous 2.5D as has been used in some FPGAs, essentially for yield enhancement). And, Intel’s EMIB appears to be a leaner, more efficient, and more flexible method of interconnecting multiple dice than a silicon interposer. In the case of HBM and 58G SerDes, the EMIB can be customized for the particular interconnect and signal integrity requirements of each application, rather than being pushed into a one-interposer-fits-all constraint. Intel seems to have used this to their advantage in Stratix 10.

Intel’s Stratix 10 line is made up of four families of FPGAs, each targeting different application domains with optimized mixtures of FPGA fabric, memory, transceivers, and other hard IP resources. The GX family are “general purpose” FPGAs with a fairly generic mix of features. The SX family brings in an ARM-based processing subsystem, making it the only SoC/FPGA built on a high-end FPGA platform – with a whopping 5.5M LUT4 equivalent FPGA fabric connected to a quad-core 64-bit ARM Cortex-A53 MPCore processor. For comparison, Xilinx’s flagship Zynq UltraScale+ family (their largest SoC/FPGAs) top out at about 1.1M LUT4s.

The MX family is equipped with in-package HBM, and the TX family boasts up to five of the “E-tiles,” which each contain 24 SerDes transceivers, half of them operating at up to 58 Gbps with PAM4. For those who want both super-high bandwidth transceivers AND HBM (and those who wouldn’t, really), there will be devices that incorporate both. This is where the 2.5D packaging technology really shines, as we’ll have devices with three dramatically different chiplets, each fabricated on different silicon technologies by different fabs, linked by super-fast low-power connections, all in one FPGA.

The big news this week is the first shipments of TX devices with blazing-fast 58 Gpbs PAM4 transceivers, delivered by up to five so-called “E-tiles.” Each E-tile includes 24 transceivers and supports up to four 100-Gb Ethernet MAC blocks, or, alternatively, six 10/25 Gb Ethernet MACs. Each E-tile, and alternate transceivers, can be operated at 58 Gbps and 30 Gpbs, giving up to 60 channels operating at 58 Gbps on the largest devices (with five E-tiles). There is also one H-tile, which brings 24 more transceivers along with PCIe gen 3×16. With the six total transceiver tiles, the FPGA has a total of 144 transceivers – all of which can operate at up to 30 Gbps. Intel says these transceivers, fabricated on TSMC 16nm silicon, are the lowest-power transceivers they’ve produced to date.

Compared with traditional NRZ, which is based on two logic levels, PAM4 distinguishes four levels of logic, yielding more bandwidth without increasing the frequency. Eye diagrams now stack three eyes vertically, replacing the traditional single “eye” with a totem-pole stack of three. PAM4 can generally achieve the higher bandwidths on the same interconnect used for NRZ without signal integrity problems, because the higher data rate is delivered without increasing the frequency of the signal. Intel’s PAM4 supports both short- and long-reach interconnect.

Intel is targeting three key markets with the TX family: wireline optical transport networks (OTN), network function virtualization (NFV), and 5G infrastructure. The new transceivers, with double the data rate of previous generations, allow designs to scale to 100G, 200G, and even 400G delivery speeds. And, because the transceivers support dual-mode modulation (both 58G PAM4 and 30G NRZ), designs can be made upward compatible – ready to support 58G while remaining backward compatible with existing 30G infrastructure. The TX devices also include hardened 100G Ethernet MAC and FEC, providing greater power efficiency and latency for 100GE.

The MX family Integrates up to four HBM2 tiles and FPGA fabric in a single package, with the goal of addressing the memory bandwidth challenge as we hit the limits of DDR. Each HBM2 tile provides up to 256 GBps of aggregate bandwidth. MX devices therefore can provide a massive 512 GBps aggregate memory bandwidth in a single package. This is key for applications such as machine learning, data analytics, image recognition, workload acceleration, 8K video processing, and HPC.

For those wanting to have their cake and eat it too, with both HBM and 58G in the same FPGA, the MX family will have versions with up to 4 transceiver tiles and HBM connected at the top and bottom. This will pack a wallop in terms of overall IO bandwidth while offering copious amounts of memory bandwidth for buffering and other high-demand tasks. This combination of HBM and 58G should prove a potent combination in many system applications.

With the recent announcement, Intel is now shipping all four variants of the Stratix 10 family – GX, SX, MX, and TX, and, at least for now, closing a chapter where the company was notably running behind their rivals in new technology introductions. At the same time, information continues to leak out about the next “Falcon Mesa” 10nm FPGA families, which should redefine once again what can be done with programmable logic. These new announcements are clear indicators that Intel is emerging from the fog of acquisition and is executing well on crucial engineering projects. It will be exciting to watch over the coming months as the battle heats up with Xilinx and both companies try to claw market share from both existing and emerging markets.

Leave a Reply

featured blogs
Aug 17, 2018
Samtec’s growing portfolio of high-performance Silicon-to-Silicon'„¢ Applications Solutions answer the design challenges of routing 56 Gbps signals through a system. However, finding the ideal solution in a single-click probably is an obstacle. Samtec last updated the...
Aug 17, 2018
If you read my post Who Put the Silicon in Silicon Valley? then you know my conclusion: Let's go with Shockley. He invented the transistor, came here, hired a bunch of young PhDs, and sent them out (by accident, not design) to create the companies, that created the compa...
Aug 16, 2018
All of the little details were squared up when the check-plots came out for "final" review. Those same preliminary files were shared with the fab and assembly units and, of course, the vendors have c...
Aug 14, 2018
I worked at HP in Ft. Collins, Colorado back in the 1970s. It was a heady experience. We were designing and building early, pre-PC desktop computers and we owned the market back then. The division I worked for eventually migrated to 32-bit workstations, chased from the deskto...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...