feature article
Subscribe Now

Supercomputing

Today, Tomorrow, Whenever

When we hear the term “supercomputing,” each of us probably forms a different image in our head – depending on our age. For my generation, I like to visualize the semi-cylindrical form of the Cray-2, with its radical architecture and Flourinert cooling. It looked fast just sitting there. Others may envision anything from IBM mainframes to racks of blades to modern video gaming consoles.

In practical terms, supercomputing seems to mean computers that have a processing power equivalent to the smartphones of five years from now, at a cost premium of hundreds to thousands of times the price of current mainstream computers. While that kind of multiplier may seem absurd, so are the lengths to which supercomputer users will go at any given time in order to gain a couple notches on the current state of Moore’s Law.

This week, in Seattle, we have the supercomputing party of the year – SC11. As is customary leading up to a major show, we’ve had the usual salvo of press releases – for everything from new blade/server configurations to cooling systems to memory and storage components. When thinking about the supercomputing problem, however, it pays to back off a step or two from the exotic and esoteric hardware on display, and to think about the problem of supercomputing at a higher level.

In recent years, we have passed a fundamental milestone in the evolution of computing architectures. There is no longer much gain to be had by cranking up the operating frequency of a single processor core. There is likewise little efficiency available from continuing to widen the word length of our basic architecture. Each of those “improvements” that has been available to us in the past will give us diminishing marginal returns at the current state of technology. This has led us to move from “how fast” to “how many” as our quest for ever more processing power has continued. Instead of trying to get more performance out of each processor, we try to see how many processors we can stuff into a box.

This, of course, brings its own set of challenges. First, we have to change the way we develop software – to be able to take advantage of all those processing elements operating in parallel. Second, we become limited fundamentally by the amount of power we have available. These two challenges define the software and the hardware side of supercomputing for the near future. Taking advantage of a multiplicity (or even a plethora) of processing elements is mainly a software problem. We need new languages, new compilers, new operating systems, and better ways of expressing algorithms in order to take advantage of massive parallelism.

On the hardware side, we need processing elements that pack a lot of computing power into a small space and (probably more important now) do the computation with the least amount of power possible. It is these two areas where FPGAs have attracted a lot of attention in recent years. We have long documented the abilities of FPGAs to accelerate many types of algorithms by significant factors – with a huge savings in power. FPGAs have walloped DSP processors, conventional processors, and even graphics processors – both in terms of raw processing throughput on a single device, and particularly when considering the amount of power consumed. 

Why, then, haven’t FPGAs taken over the supercomputing scene entirely?

Oh, how they have tried. Reconfigurable computing engines based on FPGAs have been designed, touted, tried, re-designed, re-touted, and ultimately – not found much acceptance. Formidable academic consortia have risen and fallen with the sole purpose of exploiting the extraordinary capabilities of FPGAs for supercomputing applications. The reason is rooted back in that software part of the problem. The only super-great way to do supercomputing on an FPGA has been to hand-code the algorithm in a hardware description language (HDL) such as VHDL or Verilog.

Well, then, why hasn’t somebody come up with a better solution for the software issue? I’m pretty sure more brainpower has been applied to that problem than to locating Bigfoot, Elvis, and UFOs combined. Well, by some definition of “brainpower” anyway. There have been more start-ups than you can count hawking high-level synthesis algorithms, new parallel programming languages that target FPGA hardware, graphical input languages, model-based design methodologies — the list goes on and on. The tool-making world understands that most people who are world experts in the subtleties of DNA-matching algorithms did not reach that pinnacle of their field while also mastering VHDL “on the side.”

However, two briefings I had on the run-up to supercomputing illustrate that major progress is in the works, and that we may be closer than we think to answering the ultimate question of FPGAs, supercomputing, and everything. First, Altera is announcing plans to support OpenCL – a language developed with GPU-based supercomputing in mind. For the past few years, led most visibly by Nvidia, GPUs have been gaining a lot of popularity for high-performance computing applications. Nvidia rolled out their CUDA (Complete Unified Device Architecture) several years ago – and it has gained significant acceptance in many traditional supercomputing strongholds like financial computing, pharmaceuticals, and oil and gas exploration. Alongside that effort came OpenCL (Open Computing Language), which provides a facility for developers to write C-like code that explicitly specifies parallelism. OpenCL actually originated at Apple and is now the charge of The Khronos Group – a non-profit industry consortium aimed specifically at compute-acceleration standards. Even though OpenCL was authored with GPUs in mind, it is equally applicable to parallel processing with modern FPGAs. 

Altera’s vision is for software developed in OpenCL to be compiled into a combination of executable software and FPGA-based accelerators. Since OpenCL has the notion of concurrency built right in, it is much more straightforward to translate an algorithm from there into a datapath in an FPGA. Altera has been developing this technology for some time, apparently, and has at least parts of the solution in the hands of key customers.

The goal here is to provide FPGA-based acceleration in areas that don’t have traditional FPGA design expertise. There is already a large community of interest around OpenCL and its relatives, and giving those developers a path to FPGA-based supercomputers has to be a good thing.

Also on the path of making FPGA-based acceleration easy is MathWorks. While their MATLAB and Simulink tools have long been the de-facto standard for algorithm prototyping, the process of getting from that prototype to actual hardware has always been a tricky one. The company has been addressing that issue in recent years – bridging the gap with what they call “model-based design.” We will have more extensive coverage of some specific aspects of the MathWorks approach in an upcoming article, but in the context of supercomputing, their approach merits consideration. By creating a representation of an algorithm based on a combination of standardized models with a sprinkling of new, user-defined functions, complex algorithms can be modeled in something conceptually very close to hardware. When the time comes to move the algorithm into hardware, the transformation is a straightforward one – translate each model into its corresponding pre-defined hardware implementation. It is essentially IP-based design, but starting conceptually at the algorithmic model.

By breaking the boundaries of the von Neumann architecture, FPGAs have an inherent advantage in supercomputing applications. They should ultimately be able to deliver more computations for less total power than any architecture based on a combination of conventional processors. With the astounding growth in capacity and performance of FPGAs in recent years, truly the only major hurdle that remains is the software component. Even there, progress is promising.

So hey, software guys – what do you say? Get us some better tools already! 

Photograph by Rama, Wikimedia Commons

5 thoughts on “Supercomputing”

  1. Memory bandwidth.

    Apart from that teensy technical issue, all the limitations will be in the mind of the algorithm developer. OpenCL lends itself to massively parallel processing, but if the algorithms themselves do not allow for it there’s not a lot to gain.

    There are some pretty good parallellism analysis tools on the market by now, and some can already be used to port code from sequential ANSI C to OpenMP, for example (www.vectorfabrics.com). Tools that produce OpenCL should then not be far around the corner.

  2. kevin,
    the supercomputing market is very very tiny. Most companies with a real need of ‘more-than-usual’ computing power can do with a cluster of PCs… CFD applications are an example.
    And there are millions (really) of people used to think sequentially, as CPUs process program insructions. Even in mathematics, problems and algorithms are tought in a sequential way, not in a parallel way.
    FPGAs and Parallel processing are really here but there is this big triple gap:
    – small market for real super computing
    – most computing needs are done with a PC cluster.
    – strong inertia of people to think sequentially.

  3. can anyone comment on the viability of Azido from a company called Data I/O? From what I can tell, Azido’s origins are with the Viva language that existed with the bankrupt firm Starbridge.

    Any comments appreciated.

Leave a Reply

featured blogs
Apr 19, 2024
Data type conversion is a crucial aspect of programming that helps you handle data across different data types seamlessly. The SKILL language supports several data types, including integer and floating-point numbers, character strings, arrays, and a highly flexible linked lis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured chalk talk

E-Mobility - Charging Stations & Wallboxes AC or DC Charging?
In this episode of Chalk Talk, Amelia Dalton and Andreas Nadler from Würth Elektronik investigate e-mobility charging stations and wallboxes. We take a closer look at the benefits, components, and functions of AC and DC wallboxes and charging stations. They also examine the role that DC link capacitors play in power conversion and how Würth Elektronik can help you create your next AC and DC wallbox or charging station design.
Jul 12, 2023
32,180 views