feature article
Subscribe Now

Saving Supercomputing with FPGAs

What We'll Do When We Hit the Wall

Massive racks of parallel processing Pentiums, Opterons, and Itaniums wasted watts at an unprecedented pace last week on the show floor at Supercomputing 2005 in Seattle. Teraflops, terabytes, and terrifying network bandwidths bombarded booth attendees looking for the last word in maximizing computational throughput. Convention center air conditioning worked overtime purging the byproducts of billions of bit manipulations per second as breaker boxes burst at the seams, straining to deliver adequate amperage to simultaneously power and cool what was probably the world’s largest temporary installation of high-performance computing equipment.

Meanwhile, the term “FPGA” was muttered in muted whispers in the aisles, hallways and conference rooms. It’s hard to believe that a cutting-edge, progressive, elitist field like supercomputing could itself have a lunatic fringe, but FPGA-based supercomputing seems to fall precisely into that precarious role – the fringe on the fringe. Picture shadowy, cloaked figures lurking in the lobby, pocket protectors securely in place under their overcoats, whispering to passers-by, “pssst – wanna see Smith-Waterman running at 50X speed on a $99 board?”

Disruptive technologies have an almost violent adoption cycle, and FPGA-based supercomputing has been no exception. First, overzealous marketers flood the air with exaggerated, overblown, and oversimplified claims of the amazing potential benefits of some new technology. Next, wide-eyed innovators and early adopters flock to find out what all the fuss is about, plunking down the big bucks for wobbly-legged, do-it-yourself, seminal implementations of not-ready-for-primetime, nascent innovation. Then, a double-whammy of reality suddenly sets in. Just as the neophytes are disappointedly discovering the dark underbelly of the new method with their first real-world experiences, the established companies in the industry-under-attack simultaneously see a threat to the status quo and launch a flurry of fear, uncertainty, and doubt aimed at undermining the credibility of the interloper. The combination of these two tidal waves of dissenting opinion generally uproots all but the hardiest of technical revolutions.

The potential of FPGAs as reconfigurable computing engines has been recognized for a while now. Enthusiastic academics eagerly jumped aboard the FPGA bandwagon in the 1990s, seeing the potential for enormous acceleration from fine-grained parallelism and the resulting reduction in transistor density and power consumption required for each incremental compute performance increase. The reasoning was (and still is) that we could use FPGAs to build custom dataflow processors on a per-algorithm basis that would do away with the instruction fetch and decode overhead and the serial nature of the Von Neumann architecture. With that extra efficiency, we should be able to increase our computational throughput by a couple of orders of magnitude without the usual corresponding increase in power consumption (and heat dissipation).

Unfortunately, the mythology of FPGA-based reconfigurable computing outpaced technical development, and an impatient supercomputing community became disenchanted with reconfigurable computing before it ever became technically and commercially viable. Researchers began to work more quietly and modestly, fearful of the “cold-fusion effect”, and the commercial market went into semi-perpetual wait-and-see mode.

Recently, however, advances in FPGA speed and density, combined with significant progress in programming environments for reconfigurable computing, have conspired to bring FPGA-based supercomputing back onto the map. This year, at the Supercomputing 2005 conference, FPGA pundits came out of the closet, moving and speaking freely about the progress in programmable logic processing architectures and software development processes for reconfigurable computers.

The problem in reconfigurable computing is (to use a currently overused marketing-pop term) – the ecosystem. The Von Neumann architecture has a rich support system in place that has evolved over decades, due to the efforts of literally millions of individuals. That system starts at the transistor level with advanced semiconductor processes and moves up through the MSI and LSI logic levels to VLSI with processor architectures created by companies like AMD and Intel. On top of these hardware architectures we see software layers like BIOSs, OSs, compilers, software libraries, IDEs, and finally applications.

For reconfigurable computing, however, all the middle levels are missing. FPGAs certainly take advantage of the latest semiconductor processes and offer useable logic structures up to about the MSI level (complex gates, multipliers, etc.), but everything between there and the applications layer is not yet invented. There is not yet even the equivalent of a BIOS for an FPGA-based reconfigurable computer that isolates the OS (if we had one) from the particulars of each specific hardware configuration. This would be useful for supporting development of any future OS-like layer that might run on top of an FPGA-based reconfigurable processor.

Of course, many of the technologies developed for Von Neumann architectures can be adapted or simply directly re-used in the FPGA supercomputer ecosystem. For example, there is no need to re-invent language parsers and elaborators specifically for FPGA-based systems. Likewise, some of the same low-level hardware/middleware structures that support Von Neumann-based systems would carry directly over into the FPGA-based processing space.

There are also many tasks in a typical computing job that do not make sense to accelerate into FPGA fabric. Conventional processors are highly efficient at complex control logic and other inherently sequential tasks. In many current high-performance computing systems, FPGAs are combined with conventional processors as algorithm accelerators for compute-intensive processes. While these systems seek to take advantage of the highly parallel structures that can be created in the logic fabric of FPGAs, they typically find their performance bottlenecks in the connections between the FPGA and the primary processor. Companies like Cray, SRC, and SGI were all on hand at Supercomputing 2005 with machines that boasted various strategies to improve that communication by increasing the bandwidth between processor and FPGA.

Another solution to this processor/fabric bandwidth problem possibly lies within the FPGAs themselves. Embedded processors now available inside the FPGA (such as Xilinx’s MicroBlaze and PowerPC, Altera’s Nios II, and Actel’s newly announced ARM7 core) all have very high bandwidth connections to the rest of the FPGA fabric. The architectural tradeoff there is that these processors are far short on the raw performance scale. Typically they are 32-bit RISC machines running at less than 100MHz with significantly less processing punch than the typical 64-bit 2GHz+ supercomputer soldier. Of course, if the processor is relegated to mundane control duty while FPGA fabric pushes the FLOPs, such an architectural tradeoff might be worth exploring.

The real technical hurdle in FPGA-based supercomputing, however, is the programming environment (or lack of one). However, we can find a clue to a temporary solution to this dilemma in computing’s past. In days of yore, a typical computer lab with access to a powerful, IBM 360-class machine (or better yet, a CDC dual-cyber!) solved a similar problem with small armies of programmers available to support the researchers. The physicists with the difficult number-crunching problems didn’t usually work directly with the giant computers. Instead, they handed off notepads full of formulae to Fortran-knowing whiz kids armed with huge decks of punchcards, who would flick a few front-panel toggle switches and batch up the bit-bashing beasts with long queues of CPU-eating science projects.

Over time, the physicists gained some programming savvy themselves, and the whiz kids left the campus to learn C and start internet companies. The expectation and the norm became direct interaction between researcher and computer as most scientists gained respectable programming skills and programming environments became more scientist-friendly. Even as subsequent generations of supercomputers grew into complex, parallel-processing systems, their users, compilers, and debuggers grew in sophistication along with them.

With FPGA-based computing, however, we are truly back again in the days of the researcher/programmer partnership. Creating an FPGA-based accelerator for today’s challenging algorithms requires a great deal of hardware design expertise, particularly if the project is to be done using conventional HDL-based design techniques. Many rely on experienced, FPGA-savvy consulting firms to take their problem into programmable logic, in much the same fashion as the consultative Fortran programmers of old. Today, there is certainly ample opportunity for capable FPGA designers to sell their services to science, implementing FPGA-based versions of some of computing’s most time-consuming problems.

There are a number of companies working to break the programming bottleneck, however, each with their own claims of ease-of-use, performance, and applicability. Starbridge Systems puts forth Viva, an inherently parallel programming environment where graphical constructs connect and assemble your algorithm with polymorphic pipes, allowing the detailed decisions regarding data formats to be postponed until the last minute and complex algorithms to be re-used independent of the underlying FPGA-based processing hardware. Industry veteran Celoxica provides a programming environment based on Handel-C, a version of the C language with hardware-specific constructs allowing C programmers to code and compile algorithms for FPGA-based hardware with minimal EE expertise. Newcomer Mitrionics offers their “Mitrion C” which is a parallel language with C-like syntax, designed to serve software developers with a programming dialect that allows acceleration into FPGA hardware without requiring an understanding of the workings of the underlying hardware.

Attempting to unite the efforts of many of these groups is the OpenFPGA consortium. Inspired by an industry-wide need for standardization, OpenFPGA offers a forum for discussion of the difficult issues around portability of IP for FPGA-based computing, standardization of interfaces and programming approaches, and general topics of concern in advancing the state-of-the art in FPGA-based acceleration. The members of OpenFPGA seem to recognize that cooperation is the fastest way to workable technical solutions that will help the entire industry.

With Moore’s law beginning to run out of steam economically, and our ability to dissipate the heat of piles of parallel processors diminishing with each incremental increase in computing power, all this is happening none too soon. We will soon hit a “wall of watts”, where we can no longer effectively add capability using our current approach, because we just won’t be able to effectively dispose of the heat. Meanwhile, the continued improvement of common computing hardware has squeezed supercomputers and the industry that supplies them into a narrow niche between Dell desktops and thermal runaway. FPGAs may offer just the solution we need to keep alive the supercomputing industry that solves some of society’s most pressing and important problems while constantly struggling with the enigmatic economics of its own existence.

featured blogs
Apr 19, 2024
Data type conversion is a crucial aspect of programming that helps you handle data across different data types seamlessly. The SKILL language supports several data types, including integer and floating-point numbers, character strings, arrays, and a highly flexible linked lis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Enabling the Evolution of E-mobility for Your Applications
The next generation of electric vehicles, including trucks, buses, construction and recreational vehicles will need connectivity solutions that are modular, scalable, high performance, and can operate in harsh environments. In this episode of Chalk Talk, Amelia Dalton and Daniel Domke from TE Connectivity examine design considerations for next generation e-mobility applications and the benefits that TE Connectivity’s PowerTube HVP-HD Connector Series bring to these designs.
Feb 28, 2024