feature article
Subscribe Now

Supercomputing

Today, Tomorrow, Whenever

When we hear the term “supercomputing,” each of us probably forms a different image in our head – depending on our age. For my generation, I like to visualize the semi-cylindrical form of the Cray-2, with its radical architecture and Flourinert cooling. It looked fast just sitting there. Others may envision anything from IBM mainframes to racks of blades to modern video gaming consoles.

In practical terms, supercomputing seems to mean computers that have a processing power equivalent to the smartphones of five years from now, at a cost premium of hundreds to thousands of times the price of current mainstream computers. While that kind of multiplier may seem absurd, so are the lengths to which supercomputer users will go at any given time in order to gain a couple notches on the current state of Moore’s Law.

This week, in Seattle, we have the supercomputing party of the year – SC11. As is customary leading up to a major show, we’ve had the usual salvo of press releases – for everything from new blade/server configurations to cooling systems to memory and storage components. When thinking about the supercomputing problem, however, it pays to back off a step or two from the exotic and esoteric hardware on display, and to think about the problem of supercomputing at a higher level.

In recent years, we have passed a fundamental milestone in the evolution of computing architectures. There is no longer much gain to be had by cranking up the operating frequency of a single processor core. There is likewise little efficiency available from continuing to widen the word length of our basic architecture. Each of those “improvements” that has been available to us in the past will give us diminishing marginal returns at the current state of technology. This has led us to move from “how fast” to “how many” as our quest for ever more processing power has continued. Instead of trying to get more performance out of each processor, we try to see how many processors we can stuff into a box.

This, of course, brings its own set of challenges. First, we have to change the way we develop software – to be able to take advantage of all those processing elements operating in parallel. Second, we become limited fundamentally by the amount of power we have available. These two challenges define the software and the hardware side of supercomputing for the near future. Taking advantage of a multiplicity (or even a plethora) of processing elements is mainly a software problem. We need new languages, new compilers, new operating systems, and better ways of expressing algorithms in order to take advantage of massive parallelism.

On the hardware side, we need processing elements that pack a lot of computing power into a small space and (probably more important now) do the computation with the least amount of power possible. It is these two areas where FPGAs have attracted a lot of attention in recent years. We have long documented the abilities of FPGAs to accelerate many types of algorithms by significant factors – with a huge savings in power. FPGAs have walloped DSP processors, conventional processors, and even graphics processors – both in terms of raw processing throughput on a single device, and particularly when considering the amount of power consumed. 

Why, then, haven’t FPGAs taken over the supercomputing scene entirely?

Oh, how they have tried. Reconfigurable computing engines based on FPGAs have been designed, touted, tried, re-designed, re-touted, and ultimately – not found much acceptance. Formidable academic consortia have risen and fallen with the sole purpose of exploiting the extraordinary capabilities of FPGAs for supercomputing applications. The reason is rooted back in that software part of the problem. The only super-great way to do supercomputing on an FPGA has been to hand-code the algorithm in a hardware description language (HDL) such as VHDL or Verilog.

Well, then, why hasn’t somebody come up with a better solution for the software issue? I’m pretty sure more brainpower has been applied to that problem than to locating Bigfoot, Elvis, and UFOs combined. Well, by some definition of “brainpower” anyway. There have been more start-ups than you can count hawking high-level synthesis algorithms, new parallel programming languages that target FPGA hardware, graphical input languages, model-based design methodologies — the list goes on and on. The tool-making world understands that most people who are world experts in the subtleties of DNA-matching algorithms did not reach that pinnacle of their field while also mastering VHDL “on the side.”

However, two briefings I had on the run-up to supercomputing illustrate that major progress is in the works, and that we may be closer than we think to answering the ultimate question of FPGAs, supercomputing, and everything. First, Altera is announcing plans to support OpenCL – a language developed with GPU-based supercomputing in mind. For the past few years, led most visibly by Nvidia, GPUs have been gaining a lot of popularity for high-performance computing applications. Nvidia rolled out their CUDA (Complete Unified Device Architecture) several years ago – and it has gained significant acceptance in many traditional supercomputing strongholds like financial computing, pharmaceuticals, and oil and gas exploration. Alongside that effort came OpenCL (Open Computing Language), which provides a facility for developers to write C-like code that explicitly specifies parallelism. OpenCL actually originated at Apple and is now the charge of The Khronos Group – a non-profit industry consortium aimed specifically at compute-acceleration standards. Even though OpenCL was authored with GPUs in mind, it is equally applicable to parallel processing with modern FPGAs. 

Altera’s vision is for software developed in OpenCL to be compiled into a combination of executable software and FPGA-based accelerators. Since OpenCL has the notion of concurrency built right in, it is much more straightforward to translate an algorithm from there into a datapath in an FPGA. Altera has been developing this technology for some time, apparently, and has at least parts of the solution in the hands of key customers.

The goal here is to provide FPGA-based acceleration in areas that don’t have traditional FPGA design expertise. There is already a large community of interest around OpenCL and its relatives, and giving those developers a path to FPGA-based supercomputers has to be a good thing.

Also on the path of making FPGA-based acceleration easy is MathWorks. While their MATLAB and Simulink tools have long been the de-facto standard for algorithm prototyping, the process of getting from that prototype to actual hardware has always been a tricky one. The company has been addressing that issue in recent years – bridging the gap with what they call “model-based design.” We will have more extensive coverage of some specific aspects of the MathWorks approach in an upcoming article, but in the context of supercomputing, their approach merits consideration. By creating a representation of an algorithm based on a combination of standardized models with a sprinkling of new, user-defined functions, complex algorithms can be modeled in something conceptually very close to hardware. When the time comes to move the algorithm into hardware, the transformation is a straightforward one – translate each model into its corresponding pre-defined hardware implementation. It is essentially IP-based design, but starting conceptually at the algorithmic model.

By breaking the boundaries of the von Neumann architecture, FPGAs have an inherent advantage in supercomputing applications. They should ultimately be able to deliver more computations for less total power than any architecture based on a combination of conventional processors. With the astounding growth in capacity and performance of FPGAs in recent years, truly the only major hurdle that remains is the software component. Even there, progress is promising.

So hey, software guys – what do you say? Get us some better tools already! 

Photograph by Rama, Wikimedia Commons

5 thoughts on “Supercomputing”

  1. Memory bandwidth.

    Apart from that teensy technical issue, all the limitations will be in the mind of the algorithm developer. OpenCL lends itself to massively parallel processing, but if the algorithms themselves do not allow for it there’s not a lot to gain.

    There are some pretty good parallellism analysis tools on the market by now, and some can already be used to port code from sequential ANSI C to OpenMP, for example (www.vectorfabrics.com). Tools that produce OpenCL should then not be far around the corner.

  2. kevin,
    the supercomputing market is very very tiny. Most companies with a real need of ‘more-than-usual’ computing power can do with a cluster of PCs… CFD applications are an example.
    And there are millions (really) of people used to think sequentially, as CPUs process program insructions. Even in mathematics, problems and algorithms are tought in a sequential way, not in a parallel way.
    FPGAs and Parallel processing are really here but there is this big triple gap:
    – small market for real super computing
    – most computing needs are done with a PC cluster.
    – strong inertia of people to think sequentially.

  3. can anyone comment on the viability of Azido from a company called Data I/O? From what I can tell, Azido’s origins are with the Viva language that existed with the bankrupt firm Starbridge.

    Any comments appreciated.

Leave a Reply

featured blogs
Feb 28, 2021
Using Cadence ® Specman ® Elite macros lets you extend the e language '”€ i.e. invent your own syntax. Today, every verification environment contains multiple macros. Some are simple '€œsyntax... [[ Click on the title to access the full blog on the Cadence Comm...
Feb 27, 2021
New Edge Rate High Speed Connector Set Is Micro, Rugged Years ago, while hiking the Colorado River Trail in Rocky Mountain National Park with my two sons, the older one found a really nice Swiss Army Knife. By “really nice” I mean it was one of those big knives wi...
Feb 26, 2021
OMG! Three 32-bit processor cores each running at 300 MHz, each with its own floating-point unit (FPU), and each with more memory than you than throw a stick at!...

featured video

Designing your own Processor with ASIP Designer

Sponsored by Synopsys

Designing your own processor is time-consuming and resource intensive, and it used to be limited to a few experts. But Synopsys’ ASIP Designer tool allows you to design your own specialized processor within your deadline and budget. Watch this video to learn more.

Click here for more information

featured paper

Use Configurable Digital IO To Give Your Industrial Controller the Edge

Sponsored by Maxim Integrated

As factories get bigger, centralized industrial process control has become difficult to manage. While there have been attempts to simplify the task, it remains unwieldy. In this design solution, we briefly review the centralized approach before looking at what potential changes edge computing will bring to the factory floor. We also show a digital IO IC that allows for smaller, more adaptable programmable logic controllers (PLCs) more suited to this developing architecture.

Click here to download the whitepaper

featured chalk talk

ROHM Gate Drivers

Sponsored by Mouser Electronics and ROHM Semiconductor

Today’s rapid growth of power and motor control applications demands a fresh look at gate driver technology. Recent advances in gate drivers help designers hit new levels of efficiency and performance in their designs. In this episode of Chalk Talk, Amelia Dalton chats with Mitch Van Ochten of ROHM about the latest in isolated and non-isolated gate driver solutions.

Click here for more information about ROHM Semiconductor Automotive Gate Drivers