Superlative Soup

Many of us who wound up in engineering-related careers were fascinated with technology as kids. Whether we were reading about cars, airplanes, computers, or hi-fi gear, we were intrigued by the latest and greatest of whatever genre we were investigating. What was the fastest car – the biggest airplane – the most powerful sound system? The exotic and superlative held a special fascination, often manifest as posters hanging on our walls or as the topic of playground discussion. “Nuh-uh, the Lambo has a top speed of almost 200MPH – way faster than that lame-o Ferrari on your poster!” It’s a funny thing to hear coming from a person who is about to jump on a five speed bike with a banana seat and squeak his way down the sidewalk at 8MPH.

Today, we’re older, wiser, and more sophisticated. We don’t just read about cool technology, we create it – for a living. The allure of the superlative is still there, however, catching our eye from the corners. Marketing people know this. That’s why press releases often contain words like “world’s largest” or “industry’s fastest.”

Over the past year in the FPGA world, we’ve had to tear down our posters and replace them with new ones. The biggest, fastest, highest-bandwidth FPGAs are all new – and last year’s superstars have faded into relative obscurity. How big, fast, and bandwidth-laden are the new champions? How about a new Xilinx FPGA equal to about four of the largest Virtex-4 parts from a couple years back? Or check out the Altera device with a total of 48 multi-gigabit transceivers – half of them capable of operating at speeds up to 11.3 Gbps? And, let’s also take a look at the Achronix Speedster – with internal speeds up to a blazing 1.5 GHz. If FPGA designers still hung posters on their walls, chances are that one of these three bad boys would be up there – looking kinda boring by poster standards, actually, but nonetheless proud of their technological achievements.

When we got down to the current 40nm process node, our programmable logic devices got downright huge. For the design teams using FPGAs for prototyping of ASICs and complex systems, the game has always been about amassing the most logic they can cram on a board. When those folks get their hands on the new Xilinx Virtex-6 LX760 with 759K 4-input look-up table (LUT) equivalents, they should be delighted. The biggest Virtex-4 device was around 200K logic cells, and the biggest Virtex-5 LX device weighed in at 330K. If you keep your Moore’s Law calendar where every two years you break open a little cardboard process node square and find a waxy chocolate inside (those chocolates are probably really nasty right now if they were made in 1966), you’d see that we’re on track with LUT count through these generations.

Despite our strong inclination to believe that the size of the biggest device is determined by marketing using the formula LUT_COUNT==COMPETITORS_LARGEST * 1.3, we went in search of the answer. What determines the size of the biggest device? “There is a certain group of customers that always wants the biggest device we can give them,” says Brent Przybus, Marketing Director at Xilinx. “For ASIC prototyping, they want to get as much logic as possible on their board. Partitioning across multiple FPGAs adds complexity, so they want the largest devices possible. Those designers will be able to design to the 330 today and drop in the 760 later.”

The LX760 is a bit of a specialty item, however, because it is designed for maximum logic fabric without features like multi-gigabit transceivers. When it comes to FPGAs that are used more in production – like the also-huge Xilinx Virtex-6 SX475T, the size and mix of various features such as memory, high-speed serial transceivers, multiplier/DSP blocks, and user I/O pins are determined by surveying a wide variety of targeted applications. “We looked at high-performance computing, DSP and wireless applications, next-gen MIMO,” continued Przybus. “We built the device to handle current and future customer demands for those kinds of applications.” The SX475T may not be the LUT-count champion, but with 2016 DSP48 blocks it is the current multiplier champ. In addition to those multipliers, the SX475T comes with 36 multi-gigabit transceivers and over 38Mbits of built-in block RAM. It would get a nice frame-able poster of its own on the wall of many of today’s communications infrastructure designers.

When it comes to SerDes bandwidth, though, Altera is currently wearing the yellow jersey. Their recently-announced and soon-to-be-shipping Stratix IV EP4SGX530 packs a whopping 48 multi-gigabit transceivers (MGTs). 24 of those are the new 11.3 Gbps variety, which puts the device squarely in the (data) path of high-bandwidth applications like 40G and 100G wireline. All that input and output bandwidth are matched with a heaping helping of logic fabric (531K LUTs worth), a bunch of on-chip memory, and DDR3 interface at up to 533MHz to support buffering of data flowing through all those MGTs. The GX530 is no slouch on DSP resources, either, with 1024 18×18 multipliers.

The Altera family has several other unique features that make it attractive, including a unique architecture for fine-grained power optimization. During compilation, timing analysis data is used to determine logic paths with extra slack time, and those paths are run in “low power” mode by varying back-bias voltage. The result is lower static power consumption. “Our programmable power technology reduces static power by up to 70%, and it’s completely automatic,” says Bernhard Friebe, Senior Product Marketing Manger at Altera. “For many of our high-speed customers, their power budget is an overriding constraint.”

But what if you’re after pure, blazing speed above all else? In that department, Achronix brings us the supercar of the programmable logic world with their 1.5GHz “Speedster” FPGAs. These devices are aimed at the same basic target applications as their Xilinx and Altera counterparts – communications infrastructure, high-speed digital signal processing, and the like. Achronix’s unique architecture makes them king of the fabric speed, however. The secret to Achronix’s blazing speed is their use of what they call a “picoPIPE.” Basically, they’re — well, we’re not really supposed to say this word, but we can spell it: a-s-y-n-c-h-r-o-n-o-u-s (shh, don’t tell them we did that). That’s right. You won’t realize it, but behind the scenes they quietly and automatically take your internal logic paths and fill them with lots of clicky little handshaking registers that let your data flow through the logic just as fast as it… can? The point is that there are no big global clock networks running around the chip and needing to be charged up every 1.5-billionth of a second with lots of expensive coulombs. The result is that your logic will run very, very fast – and you shouldn’t stop and think about it too much.

Speedster doesn’t have as many LUTs as the super-giant FPGAs above, but in many cases it won’t need them. If your internal logic runs faster, you can often get by with narrower datapaths throughout your whole design. If your datapaths are running faster, you also may need less memory for buffering. “Our target applications today are things like InfiniBand, memory testers, switching, routing – any place that they need the highest performance,” says Yousef Khalilollahi, Vice President of Marketing at Achronix. “FPGAs are a general-purpose product, but we’re focused specifically on high-performance.”

With 10Mb available block RAM, 40 10.3 Gbps SerDes lanes, and 270 18×18 multipliers, the Speedster is hardly small, however. It definitely rates a cheesecake wall-poster with our other two behemoths.

Speaking of our wall poster series, FPGA Journal will happily publish our collector’s edition, glossy, perfect-for-framing-and-ogling, exclusive poster collection – just as soon as we can figure out how to make a poster of an FPGA look like something more interesting than a black square with a logo silkscreened on it. Suggestions?

Superlative Soup

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk