feature article
Subscribe Now


Today, Tomorrow, Whenever

When we hear the term “supercomputing,” each of us probably forms a different image in our head – depending on our age. For my generation, I like to visualize the semi-cylindrical form of the Cray-2, with its radical architecture and Flourinert cooling. It looked fast just sitting there. Others may envision anything from IBM mainframes to racks of blades to modern video gaming consoles.

In practical terms, supercomputing seems to mean computers that have a processing power equivalent to the smartphones of five years from now, at a cost premium of hundreds to thousands of times the price of current mainstream computers. While that kind of multiplier may seem absurd, so are the lengths to which supercomputer users will go at any given time in order to gain a couple notches on the current state of Moore’s Law.

This week, in Seattle, we have the supercomputing party of the year – SC11. As is customary leading up to a major show, we’ve had the usual salvo of press releases – for everything from new blade/server configurations to cooling systems to memory and storage components. When thinking about the supercomputing problem, however, it pays to back off a step or two from the exotic and esoteric hardware on display, and to think about the problem of supercomputing at a higher level.

In recent years, we have passed a fundamental milestone in the evolution of computing architectures. There is no longer much gain to be had by cranking up the operating frequency of a single processor core. There is likewise little efficiency available from continuing to widen the word length of our basic architecture. Each of those “improvements” that has been available to us in the past will give us diminishing marginal returns at the current state of technology. This has led us to move from “how fast” to “how many” as our quest for ever more processing power has continued. Instead of trying to get more performance out of each processor, we try to see how many processors we can stuff into a box.

This, of course, brings its own set of challenges. First, we have to change the way we develop software – to be able to take advantage of all those processing elements operating in parallel. Second, we become limited fundamentally by the amount of power we have available. These two challenges define the software and the hardware side of supercomputing for the near future. Taking advantage of a multiplicity (or even a plethora) of processing elements is mainly a software problem. We need new languages, new compilers, new operating systems, and better ways of expressing algorithms in order to take advantage of massive parallelism.

On the hardware side, we need processing elements that pack a lot of computing power into a small space and (probably more important now) do the computation with the least amount of power possible. It is these two areas where FPGAs have attracted a lot of attention in recent years. We have long documented the abilities of FPGAs to accelerate many types of algorithms by significant factors – with a huge savings in power. FPGAs have walloped DSP processors, conventional processors, and even graphics processors – both in terms of raw processing throughput on a single device, and particularly when considering the amount of power consumed. 

Why, then, haven’t FPGAs taken over the supercomputing scene entirely?

Oh, how they have tried. Reconfigurable computing engines based on FPGAs have been designed, touted, tried, re-designed, re-touted, and ultimately – not found much acceptance. Formidable academic consortia have risen and fallen with the sole purpose of exploiting the extraordinary capabilities of FPGAs for supercomputing applications. The reason is rooted back in that software part of the problem. The only super-great way to do supercomputing on an FPGA has been to hand-code the algorithm in a hardware description language (HDL) such as VHDL or Verilog.

Well, then, why hasn’t somebody come up with a better solution for the software issue? I’m pretty sure more brainpower has been applied to that problem than to locating Bigfoot, Elvis, and UFOs combined. Well, by some definition of “brainpower” anyway. There have been more start-ups than you can count hawking high-level synthesis algorithms, new parallel programming languages that target FPGA hardware, graphical input languages, model-based design methodologies — the list goes on and on. The tool-making world understands that most people who are world experts in the subtleties of DNA-matching algorithms did not reach that pinnacle of their field while also mastering VHDL “on the side.”

However, two briefings I had on the run-up to supercomputing illustrate that major progress is in the works, and that we may be closer than we think to answering the ultimate question of FPGAs, supercomputing, and everything. First, Altera is announcing plans to support OpenCL – a language developed with GPU-based supercomputing in mind. For the past few years, led most visibly by Nvidia, GPUs have been gaining a lot of popularity for high-performance computing applications. Nvidia rolled out their CUDA (Complete Unified Device Architecture) several years ago – and it has gained significant acceptance in many traditional supercomputing strongholds like financial computing, pharmaceuticals, and oil and gas exploration. Alongside that effort came OpenCL (Open Computing Language), which provides a facility for developers to write C-like code that explicitly specifies parallelism. OpenCL actually originated at Apple and is now the charge of The Khronos Group – a non-profit industry consortium aimed specifically at compute-acceleration standards. Even though OpenCL was authored with GPUs in mind, it is equally applicable to parallel processing with modern FPGAs. 

Altera’s vision is for software developed in OpenCL to be compiled into a combination of executable software and FPGA-based accelerators. Since OpenCL has the notion of concurrency built right in, it is much more straightforward to translate an algorithm from there into a datapath in an FPGA. Altera has been developing this technology for some time, apparently, and has at least parts of the solution in the hands of key customers.

The goal here is to provide FPGA-based acceleration in areas that don’t have traditional FPGA design expertise. There is already a large community of interest around OpenCL and its relatives, and giving those developers a path to FPGA-based supercomputers has to be a good thing.

Also on the path of making FPGA-based acceleration easy is MathWorks. While their MATLAB and Simulink tools have long been the de-facto standard for algorithm prototyping, the process of getting from that prototype to actual hardware has always been a tricky one. The company has been addressing that issue in recent years – bridging the gap with what they call “model-based design.” We will have more extensive coverage of some specific aspects of the MathWorks approach in an upcoming article, but in the context of supercomputing, their approach merits consideration. By creating a representation of an algorithm based on a combination of standardized models with a sprinkling of new, user-defined functions, complex algorithms can be modeled in something conceptually very close to hardware. When the time comes to move the algorithm into hardware, the transformation is a straightforward one – translate each model into its corresponding pre-defined hardware implementation. It is essentially IP-based design, but starting conceptually at the algorithmic model.

By breaking the boundaries of the von Neumann architecture, FPGAs have an inherent advantage in supercomputing applications. They should ultimately be able to deliver more computations for less total power than any architecture based on a combination of conventional processors. With the astounding growth in capacity and performance of FPGAs in recent years, truly the only major hurdle that remains is the software component. Even there, progress is promising.

So hey, software guys – what do you say? Get us some better tools already! 

Photograph by Rama, Wikimedia Commons

5 thoughts on “Supercomputing”

  1. Memory bandwidth.

    Apart from that teensy technical issue, all the limitations will be in the mind of the algorithm developer. OpenCL lends itself to massively parallel processing, but if the algorithms themselves do not allow for it there’s not a lot to gain.

    There are some pretty good parallellism analysis tools on the market by now, and some can already be used to port code from sequential ANSI C to OpenMP, for example (www.vectorfabrics.com). Tools that produce OpenCL should then not be far around the corner.

  2. kevin,
    the supercomputing market is very very tiny. Most companies with a real need of ‘more-than-usual’ computing power can do with a cluster of PCs… CFD applications are an example.
    And there are millions (really) of people used to think sequentially, as CPUs process program insructions. Even in mathematics, problems and algorithms are tought in a sequential way, not in a parallel way.
    FPGAs and Parallel processing are really here but there is this big triple gap:
    – small market for real super computing
    – most computing needs are done with a PC cluster.
    – strong inertia of people to think sequentially.

  3. can anyone comment on the viability of Azido from a company called Data I/O? From what I can tell, Azido’s origins are with the Viva language that existed with the bankrupt firm Starbridge.

    Any comments appreciated.

Leave a Reply

featured blogs
Dec 4, 2023
As a titan in creating technological solutions for electronic systems design, Cadence expanded our footprint from electronic design automation (EDA) into molecular design and life sciences simulation when we partnered with OpenEye Scientific. In a strategic move that underlin...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines.

Read more

featured chalk talk

Product Blocked by Supply Chain Woes? Digi XBee® RR to the Rescue!
Sponsored by Mouser Electronics and Digi
In this episode of Chalk Talk, Amelia Dalton and Quinn Jones from Digi investigate the benefits that the Digi XBee RR wireless modules can bring to your next design. We also take a closer look at the migration path from Digi XBee 3 to XBee RR, the design aspects you should keep in mind when moving from the Digi XBee 3 to the RR and how the Digi XBee Multi-programmer can help you get exactly the configuration you need in your next design.
Feb 1, 2023