feature article
Subscribe Now

Superlative Soup

The Three Biggest Baddest FPGAs

Many of us who wound up in engineering-related careers were fascinated with technology as kids. Whether we were reading about cars, airplanes, computers, or hi-fi gear, we were intrigued by the latest and greatest of whatever genre we were investigating. What was the fastest car – the biggest airplane – the most powerful sound system? The exotic and superlative held a special fascination, often manifest as posters hanging on our walls or as the topic of playground discussion. “Nuh-uh, the Lambo has a top speed of almost 200MPH – way faster than that lame-o Ferrari on your poster!” It’s a funny thing to hear coming from a person who is about to jump on a five speed bike with a banana seat and squeak his way down the sidewalk at 8MPH.

Today, we’re older, wiser, and more sophisticated.  We don’t just read about cool technology, we create it – for a living.  The allure of the superlative is still there, however, catching our eye from the corners.  Marketing people know this.  That’s why press releases often contain words like “world’s largest” or “industry’s fastest.” 

Over the past year in the FPGA world, we’ve had to tear down our posters and replace them with new ones.  The biggest, fastest, highest-bandwidth FPGAs are all new – and last year’s superstars have faded into relative obscurity.  How big, fast, and bandwidth-laden are the new champions?  How about a new Xilinx FPGA equal to about four of the largest Virtex-4 parts from a couple years back?  Or check out the Altera device with a total of 48 multi-gigabit transceivers – half of them capable of operating at speeds up to 11.3 Gbps? And, let’s also take a look at the Achronix Speedster – with internal speeds up to a blazing 1.5 GHz.  If FPGA designers still hung posters on their walls, chances are that one of these three bad boys would be up there – looking kinda boring by poster standards, actually, but nonetheless proud of their technological achievements.

When we got down to the current 40nm process node, our programmable logic devices got downright huge.  For the design teams using FPGAs for prototyping of ASICs and complex systems, the game has always been about amassing the most logic they can cram on a board.  When those folks get their hands on the new Xilinx Virtex-6 LX760 with 759K 4-input look-up table (LUT) equivalents, they should be delighted.  The biggest Virtex-4 device was around 200K logic cells, and the biggest Virtex-5 LX device weighed in at 330K.  If you keep your Moore’s Law calendar where every two years you break open a little cardboard process node square and find a waxy chocolate inside (those chocolates are probably really nasty right now if they were made in 1966), you’d see that we’re on track with LUT count through these generations. 

Despite our strong inclination to believe that the size of the biggest device is determined by marketing using the formula LUT_COUNT==COMPETITORS_LARGEST * 1.3, we went in search of the answer.  What determines the size of the biggest device?  “There is a certain group of customers that always wants the biggest device we can give them,” says Brent Przybus, Marketing Director at Xilinx.  “For ASIC prototyping, they want to get as much logic as possible on their board.  Partitioning across multiple FPGAs adds complexity, so they want the largest devices possible.  Those designers will be able to design to the 330 today and drop in the 760 later.”

The LX760 is a bit of a specialty item, however, because it is designed for maximum logic fabric without features like multi-gigabit transceivers.  When it comes to FPGAs that are used more in production – like the also-huge Xilinx Virtex-6 SX475T, the size and mix of various features such as memory, high-speed serial transceivers, multiplier/DSP blocks, and user I/O pins are determined by surveying a wide variety of targeted applications.  “We looked at high-performance computing, DSP and wireless applications, next-gen MIMO,” continued Przybus.  “We built the device to handle current and future customer demands for those kinds of applications.”  The SX475T may not be the LUT-count champion, but with 2016 DSP48 blocks it is the current multiplier champ.  In addition to those multipliers, the SX475T comes with 36 multi-gigabit transceivers and over 38Mbits of built-in block RAM. It would get a nice frame-able poster of its own on the wall of many of today’s communications infrastructure designers.

When it comes to SerDes bandwidth, though, Altera is currently wearing the yellow jersey.  Their recently-announced and soon-to-be-shipping Stratix IV EP4SGX530 packs a whopping 48 multi-gigabit transceivers (MGTs).  24 of those are the new 11.3 Gbps variety, which puts the device squarely in the (data) path of high-bandwidth applications like 40G and 100G wireline.  All that input and output bandwidth are matched with a heaping helping of logic fabric (531K LUTs worth), a bunch of on-chip memory, and DDR3 interface at up to 533MHz to support buffering of data flowing through all those MGTs.  The GX530 is no slouch on DSP resources, either, with 1024 18×18 multipliers. 

The Altera family has several other unique features that make it attractive, including a unique architecture for fine-grained power optimization.  During compilation, timing analysis data is used to determine logic paths with extra slack time, and those paths are run in “low power” mode by varying back-bias voltage.  The result is lower static power consumption.  “Our programmable power technology reduces static power by up to 70%, and it’s completely automatic,” says Bernhard Friebe, Senior Product Marketing Manger at Altera.  “For many of our high-speed customers, their power budget is an overriding constraint.” 

But what if you’re after pure, blazing speed above all else?  In that department, Achronix brings us the supercar of the programmable logic world with their 1.5GHz “Speedster” FPGAs.  These devices are aimed at the same basic target applications as their Xilinx and Altera counterparts – communications infrastructure, high-speed digital signal processing, and the like.  Achronix’s unique architecture makes them king of the fabric speed, however.  The secret to Achronix’s blazing speed is their use of what they call a “picoPIPE.” Basically, they’re — well, we’re not really supposed to say this word, but we can spell it: a-s-y-n-c-h-r-o-n-o-u-s (shh, don’t tell them we did that).  That’s right.  You won’t realize it, but behind the scenes they quietly and automatically take your internal logic paths and fill them with lots of clicky little handshaking registers that let your data flow through the logic just as fast as it… can?  The point is that there are no big global clock networks running around the chip and needing to be charged up every 1.5-billionth of a second with lots of expensive coulombs.  The result is that your logic will run very, very fast – and you shouldn’t stop and think about it too much. 

Speedster doesn’t have as many LUTs as the super-giant FPGAs above, but in many cases it won’t need them.  If your internal logic runs faster, you can often get by with narrower datapaths throughout your whole design.  If your datapaths are running faster, you also may need less memory for buffering.  “Our target applications today are things like InfiniBand, memory testers, switching, routing – any place that they need the highest performance,” says Yousef Khalilollahi, Vice President of Marketing at Achronix.  “FPGAs are a general-purpose product, but we’re focused specifically on high-performance.” 

With 10Mb available block RAM, 40 10.3 Gbps SerDes lanes, and 270 18×18 multipliers, the Speedster is hardly small, however.  It definitely rates a cheesecake wall-poster with our other two behemoths. 

Speaking of our wall poster series, FPGA Journal will happily publish our collector’s edition, glossy, perfect-for-framing-and-ogling, exclusive poster collection – just as soon as we can figure out how to make a poster of an FPGA look like something more interesting than a black square with a logo silkscreened on it.  Suggestions?

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

AI SoC Chats: Scaling AI Systems with Die-to-Die Interfaces

Sponsored by Synopsys

Join Synopsys Interface IP expert Manmeet Walia to understand the trends around scaling AI SoCs and systems while minimizing latency and power by using die-to-die interfaces.

Click here for more information about DesignWare IP for Amazing AI

featured paper

Top 9 design questions about digital isolators

Sponsored by Texas Instruments

Looking for more information about digital isolators? We’re here to help. Based on TI E2E™ support forum feedback, we compiled a list of the most frequently asked questions about digital isolator design challenges. This article covers questions such as, “What is the logic state of a digital isolator with no input signal?”, and “Can you leave unused channel pins on a digital isolator floating?”

Click here to download the whitepaper

Featured Chalk Talk

Introducing Google Coral

Sponsored by Mouser Electronics and Google

AI inference at the edge is exploding right now. Numerous designs that can’t use cloud processing for AI tasks need high-performance, low-power AI acceleration right in their embedded designs. Wouldn’t it be cool if those designs could have their own little Google TPU? In this episode of Chalk Talk, Amelia Dalton chats with James McKurkin of Google about the Google Coral edge TPU.

More information about Coral System on Module