feature article
Subscribe Now

When Intel Buys Altera

Will FPGAs Take Over the Data Center?

At the Gigaom Structure 2014 event last week, Intel’s Diane Bryant announced that Intel is “integrating [Intel’s] industry-leading Xeon processor with a coherent FPGA in a single package, socket compatible to [the] standard Xeon E5 processor offerings.” Bryant continues, saying that the FPGA will provide Intel customers “a programmable, high performance coherent acceleration capability to turbo-charge their algorithms” and that industry benchmarks indicate that FPGA-based accelerators can deliver >10x performance gains, with an Intel-claimed 2x additional performance, thanks to a low-latency coherent interface between the FPGA and the processor. 

If we did our math right, Intel is implying that an FPGA could boost the speed of a server-based application by somewhere in the range of 20x.

At almost the same time, Microsoft announced a system it calls “Catapult” (which apparently has no connection whatsoever to the very closely related algorithmic synthesis technology from Calypto, Inc. – which oddly bears exactly the same name). Microsoft’s Catapult, described in a paper titled: “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services,” achieved a reported 95% increase in Bing search engine performance, with only a 10% increase in power consumption. Yep, pairing FPGAs (and most definitely Altera FPGAs, in this case) with traditional processors basically doubled the performance and power-efficiency of a traditional heavy-iron server task.

Well, who out there didn’t see this coming? Anyone? Anyone?

With data center power consumption estimated at somewhere between one and ten percent of the entire world’s electricity use – and growing fast, doing more computation with less energy is a problem with enormous economic and ecological stakes. Today, data centers are built based on access to cheap power, and the size and throughput of those data centers is typically limited by how much power can be brought into the building and how much heat can be taken out. Companies like Microsoft, Google, Facebook, and eBay clearly would be highly motivated to both crank up the MIPS and lower the electric bill.

At the same time, Moore’s Law, after a nearly fifty-year run, is most definitely running out of gas. First, we hit the power wall on single-core processors, reaching a point where clocking the chips faster ran up the power more than it improved performance. Then we went to two, four, and more cores, and, finally, we’re looking at wider instructions and data busses to compensate for the lack of continued progress in the underlying semiconductor processes. Simply waiting for better silicon to solve the data center’s power woes is not a viable option.

Just about everyone who reads these pages understands that FPGAs offer the potential for dramatically increased compute performance combined with much lower power consumption. For specialized algorithms, an FPGA-based hardware implementation offers the rewards of fine-grained parallelism – lower latency, higher throughput, and much lower power.

Of course, everyone who already understands the benefits of FPGA-based compute acceleration also knows the single biggest obstacle to widespread adoption: the programming model. Traditional von Neumann processors and their accompanying ecosystem have evolved to the point of being drop-dead easy to program. Slap down a few lines of C, C++, or any other popular language, crank up an open-source compiler, and your computer is computing in no time. 

With FPGAs, getting the gates to do your bidding is a substantially greater challenge – one that has even provided a lucrative livelihood for many of us. Converting a complex algorithm to an efficient custom hardware architecture, then describing that architecture in a hardware description language, then simulating, synthesizing, and placing-and-routing the resulting design can get you to the point … where you now have hundreds of annoying timing violations to sort out. An EE degree, a fair amount of experience, and a few months of free time are pretty much minimum requirements for making efficient use of FPGA fabric to accelerate a single high-performance algorithm, and most of the folks writing complex algorithms for cloud and data center applications don’t have a lot of extra time and energy to pick up that kind of expertise.

This programming problem has not escaped the attention of the folks who make FPGAs, of course. They’ve been toiling away for years trying to simplify the process of programming FPGAs. Today, the state-of-the-art is best represented by three primary approaches: model-based design, high-level synthesis, and parallel programming languages like OpenCL. All of these approaches have merit for different types of problems and for different programmer skill sets. None of them has reached anything like the robustness required for the general programming public to be able to efficiently take advantage of FPGA co-processing.

So, why would Intel ever buy Altera?

(Note: We have no indication that Intel has any actual plans to buy Altera, so we’re speculating here.)

Intel has arguably the most advanced semiconductor processes in the world, and those processes have historically been applied to making high-performance processors for PCs and servers. These days, the PC market is waning, and Intel has failed to capture any meaningful share of the exploding mobile and tablet market, which is dominated by lower-power ARM architecture processors. That leaves Intel with the server market, which (thanks to the reliance of the mobile, cloud, and emerging IoT markets on giant server farms) is growing rapidly.

However, as we mentioned above, power is the primary limiting factor in the global data center build-out, and ARM is trying to get into the server game by taking advantage of their comparatively lower-power processor architectures. This poses a significant risk to Intel’s domination of the racks. With PCs in the decline and the data center possibly up for grabs, Intel needs to do something.

What Intel needs is a game-changing answer to server power efficiency, and the best place to look for that is in FPGAs. 

Of course, Intel could make their own FPGAs or FPGA fabric – integrated in the same package or potentially even on the same die as their processors, but that doesn’t solve the problem. The key to success with FPGA technology is tools, not fabric. And, if Intel put every engineer in the company to the task of developing FPGA tools for the next decade, they would not be able to match what Altera and Xilinx have today. Robust FPGA tools require tens of thousands of user-generated designs to botch their way through the tool flow, and no amount of careful engineering development can replace that experience-based tool evolution.

Furthermore, what Altera and Xilinx have today is (as we mentioned earlier) not yet even remotely up to the task of smoothly compiling high-performance server-based algorithms into a form that will efficiently execute on hybrid processor/FPGA heterogeneous computing servers. They have the bare bones of a few marginally workable solutions. Of course, as this recent announcement shows, Intel could partner with Altera or Xilinx and hope that those companies give enough attention to the server space to pull it off, but with the perpetual lure of the lucrative comms space constantly distracting the FPGA companies from the server world’s problems, that crucial attention is most definitely not guaranteed. 

This announcement is certainly not Intel’s first warning shot about FPGAs, heterogeneous compute acceleration with FPGAs, or partnering with companies like Altera. A few years back, Intel launched another device family, the E6x5C, with an Atom processor and an Altera Arria FPGA sharing the same package, connected by PCIe. 

This new announcement bumps the processor component of that up to Xeon land, and it bumps the ever-so-critical FPGA-to-processor communication channel up from PCIe to low-latency, coherent Quickpath Interconnect (QPI) – reportedly capable of up to 25 Gbps communication at very low latency. As one can see from the Bing/Microsoft paper (or as many of us know from traumatic personal experience), the architecture for passing and sharing data between processor, FPGA, and memory is the single most important feature (and potential bottleneck) of any heterogeneous computing platform with FPGA fabric.

Intel is experimenting and learning with other pieces of the FPGA puzzle as well, of course. After dipping their toes in the water by partnering with smaller FPGA suppliers Achronix and Tabula, fabricating devices for those companies on the 22nm Tri-Gate (FinFET) process, the company stepped up to a manufacturing partnership with Altera for the upcoming Stratix 10 FPGA family, based on Intel’s 14nm Tri-Gate process. This is a critical (and vastly under-appreciated) engineering task where both the FPGA fabric and the semiconductor process must be adapted and evolved to work together. You can’t just slap any old FPGA fabric on a cutting-edge semiconductor process and expect it to work, and you conversely can’t take just any semiconductor process and succeed – even with a proven FPGA fabric. Both parts have to meet and meld in the middle.

For the record, Intel isn’t saying which FPGA company they are partnering with for the new heterogeneous Xeon devices. Both Altera and Xilinx are tight-lipped as well, so we’re gonna put our well-considered bet on Altera. Either way, the curtain will be pulled back soon enough, because Intel says that the end customers will need to use the FPGA company’s tools and design flow in order to take advantage of the FPGA portion of the processor. So, that conversation would be something like this. 

Facebook: Hey Intel we’d like to use your new heterogeneous Xeon/FPGA processors.

Intel: OK, you’ll need to get FPGA tools and support from the vendor.

Facebook: Which vendor is that?

Intel: We’re not saying…

OK, maybe not exactly, but – that’s one secret that won’t last long.

It’s important to note that Intel is not the first to plan mass production of heterogeneous processors with FPGAs. Xilinx has been attacking that market for a few years now with their Zynq family – which incorporates ARM processors with Xilinx FPGA fabric. Altera is aggressively giving chase with their own ARM-based FPGA SoC families. While Zynq is certainly not a data-center-class processor, the distance from today’s Zynq to one that would be a viable server-class solution isn’t huge, and the expertise and tool flow that Xilinx is accumulating with Zynq would come in very handy in a fight over low-power server dominance.

Even though Intel soft-pedaled the Xeon/FPGA announcement, the potential implications are enormous. If the tools can get good enough (and that’s a big IF), we are looking at the displacement of the von Neumann processor as the dominant computing architecture for the majority of the world’s data centers, and therefore the majority of the world’s computation. And, it could happen during the most rapid expansion of global computing power in history.

Sure, Intel could continue to defend its turf against insurgent ARM-based architectures in the single most important market in the world by simply partnering with a few smaller companies (like Altera) for the most critical enabling technologies. Intel could hope that those partners will spend enough time and energy solving the tool flow problem to make that discontinuous leap in the global computing architecture possible and practical.

Somehow that scenario doesn’t seem like the most likely to me.

16 thoughts on “When Intel Buys Altera”

  1. Dear Kevin,
    Why are you so sure that Altera would the FPGA vendor of choice for the FPGA in the server chip? (Got any actual evidence?)

    Look at who the main FPGA vendors currently get to make manufacture their FPGAs:
    1) Xilinx use TSMC only
    2) Altera use TSMC mostly, and Intel for some.
    3) Achronix use Intel only

    On that basis, Intel have a much cosier business relationship with Achronix, and if they are going to buy anyone, it would be Achronix.

    Achronix would be much cheaper to buy than Altera or Xilinx, and would be far more obvious choice for Intel if they wanted to make a purchase to guarantee their ongoing supply of FPGA cores for their server chips.

    Nicholas Lee

  2. Hi Nicholas,

    A good question with some good points! And now, we have a vote for Achronix.

    My reasons (some of them, anyway) for guessing Altera vs Achronix are:
    – Achronix devices are optimized for a pretty specific set of connectivity applications – particularly with the choice of hard IP. Those are not the same optimizations one would make for data center compute acceleration
    – Altera has pursued the compute acceleration market aggressively with initiatives like their OpenCL support.
    – I think the main value to Intel and Intel customers is the tool and support from the FPGA company. It wouldn’t be that difficult for Intel to just make their own FPGA fabric, but tools and support are another matter. Altera is far more advanced and scalable than Achronix on that front, and brings a lot more tool technology to the table.

    My reasons for guessing Altera vs Xilinx are:
    – Intel went with Altera for their previous processor+FPGA product
    – Altera’s fab agreement with Intel apparently has some provision excluding Xilinx, so it would be odd (but not unfathomable) for Intel to make a different exclusive deal with Xilinx
    – Altera has been more visibly pursuing the server market, and may be developing devices more tuned to that particular application (we don’t know the specifics of Stratix 10 yet.)


  3. Very interesting article.

    Note also that both Xilinx and Altera offer platfroms with embedded ARM cores, and based on the the fact that many start-ups are working on ARM-based server, maybe soon we will see multi-core FPGAs with ARM cores targetting the server market.

    Also the FPGAs could offer commonly-used accelerators for the data centers (in order to avoid coverting every data center algorithm to hardware), such as accelerator for the MapReduce:

  4. @kachris,

    Agreed! I touched on that a bit in the article – talking about Zynq and SoC FPGAs. I don’t think the “processor” part of those offerings (at this point) are really data-center-class processors, but that would be a reasonable evolution of those products.

    For me, the important part about Xilinx and Altera having the processor and FPGA fabric integrated on the same die (or in the same 2.5D interposer setup), is the vast amount of bandwidth/connectivity available between the processor, FPGA fabric, and memory. It seems to me that would have the potential to offer much more compute power, and possibly much less power consumption since the FPGA-Processor signals don’t have to go through off-chip IO buffers.


  5. Pingback: DMPK
  6. Pingback: Bdsm
  7. Pingback: kari satilir
  8. Pingback: In Vitro ADME
  9. Pingback: agen poker
  10. Pingback: scr888
  11. If I were the Intel product manager, we would be purchasing IP rights to the Polarfire product technology simply because of the low power … both at start-up and running flat out … in the end it comes down to cooling on how fast you can go.

Leave a Reply

featured blogs
Apr 13, 2021
If a picture is worth a thousand words, a video tells you the entire story. Cadence's subsystem SoC silicon for PCI Express (PCIe) 5.0 demo video shows you how we put together the latest... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Apr 12, 2021
The Semiconductor Ecosystem- It is the definition of '€œHigh Tech'€, but it isn'€™t just about… The post Calibre and the Semiconductor Ecosystem appeared first on Design with Calibre....
Apr 8, 2021
We all know the widespread havoc that Covid-19 wreaked in 2020. While the electronics industry in general, and connectors in particular, took an initial hit, the industry rebounded in the second half of 2020 and is rolling into 2021. Travel came to an almost stand-still in 20...
Apr 7, 2021
We explore how EDA tools enable hyper-convergent IC designs, supporting the PPA and yield targets required by advanced 3DICs and SoCs used in AI and HPC. The post Why Hyper-Convergent Chip Designs Call for a New Approach to Circuit Simulation appeared first on From Silicon T...

featured video

Meeting Cloud Data Bandwidth Requirements with HPC IP

Sponsored by Synopsys

As people continue to work remotely, demands on cloud data centers have never been higher. Chip designers for high-performance computing (HPC) SoCs are looking to new and innovative IP to meet their bandwidth, capacity, and security needs.

Click here for more information

featured paper

Understanding Functional Safety FIT Base Failure Rate Estimates per IEC 62380 and SN 29500

Sponsored by Texas Instruments

Functional safety standards such as IEC 61508 and ISO 26262 require semiconductor device manufacturers to address both systematic and random hardware failures. Base failure rates (BFR) quantify the intrinsic reliability of the semiconductor component while operating under normal environmental conditions. Download our white paper which focuses on two widely accepted techniques to estimate the BFR for semiconductor components; estimates per IEC Technical Report 62380 and SN 29500 respectively.

Click here to download the whitepaper

featured chalk talk

Silicon Lifecycle Management (SLM)

Sponsored by Synopsys

Wouldn’t it be great if we could keep on analyzing our IC designs once they are in the field? After all, simulation and lab measurements can never tell the whole story of how devices will behave in real-world use. In this episode of Chalk Talk, Amelia Dalton chats with Randy Fish of Synopsys about gaining better insight into IC designs through the use of embedded monitors and sensors, and how we can enable a range of new optimizations throughout the lifecycle of our designs.

Click here for more information about Silicon Lifecycle Management Platform