feature article
Subscribe Now

Cray Goes FPGA

Algorithm Acceleration in the New XD1

When I was in college, I knew the future of supercomputing. The supercomputers of the 21st century would be massive, gleaming masterpieces of technology. They would not be installed into buildings, but rather buildings would be designed and constructed around them – particularly to house the cooling systems. The design specifics were fuzzy, but I was reasonably sure that very low temperatures would be involved for either superconducting connectivity, SQUIDs, or Josephson junction-esque switching. Silicon would certainly have been long abandoned in favor of Gallium Arsenide or some even more exotic semiconductor material. I believed that Cray, Inc., as the preeminent developer of supercomputers, would be able to leverage these techniques to gain perhaps a full order of magnitude of computing performance over the machines of the day.

A few years later, when Xilinx rolled out their first FPGAs, I could see the future of that technology as well. FPGAs would act as a sort of system-level silicon super-glue, sitting at the periphery of the circuit board and stitching together incompatible protocols. With the simple addition of an FPGA, anything could be made to connect to anything else, and programmability insured that we could adapt on the fly and change our design to leverage any new, improved component without having to abandon the rest of our legacy design.

As I gazed into my crystal ball (looking way out past the distorted reflection of my feathered hair, Lacoste polo shirt, and wayfarer sunglasses), I could not envision any connection between these two seemingly unrelated technology tracks. Supercomputers would be designed and built from the ground up, using carefully matched and optimized homogenous components, while FPGAs would be the duct-tape of electronics design, helping to hold together aging multi-generational systems for a few more years of life in the field before they were retired altogether. In my crystal ball, the two paths were obviously diverging.

I was right about the Cray part.

The Cray XD1, one of the latest innovations from the world’s best-known supercomputer manufacturer, leverages Xilinx’s FPGA technology to provide massive algorithm acceleration through hardware-based implementation of compute-intensive algorithmic tasks. While we in the editorial community were idly debating whether FPGAs might be useful as reconfigurable computing engines after all, Cray was busy at work back in the lab building the thing. “We are continually researching new ways to gain greater application performance for our customers,” says Geert Wenes, business manager responsible for emerging markets at Cray. “With the Cray XD1 direct connect architecture combined with the new generation of FPGAs, we saw an opportunity to gain orders of magnitude speed-up for some of our customers’ most challenging applications. Applications that are highly parallel on a fine-grained level and spend much of their computation time on integer and fixed point calculations, such as adaptive optics simulations, seismic imaging, or even molecular docking applications in life sciences stand to gain 10 times or more overall application performance improvement with FPGA application acceleration. In many cases, such speed-ups are necessary to make the application a viable one for our customers.”

“Our alliance with Cray was a natural fit for Xilinx,” said Sandeep Vij, vice president of worldwide marketing at Xilinx.” Both companies have established technical leadership in our respective markets, and we share the same fundamental values of providing customers with leading edge products and unprecedented service. We were extremely impressed with the technical prowess of the people at Cray. This was one of those rare instances where collaboration with a customer directly benefited our own product.”

The XD1 architecture takes advantage of what Cray calls “Rapid Array Interconnect” to couple Xilinx Virtex II Pro devices directly to the AMD Opteron processors in each blade through a 3.2 GB/s bi-directional connection. “By treating the FPGAs as integral system components rather than peripherals and linking them directly to the processors through high-speed connections, we were able to remove one of the biggest bottlenecks to FPGA co-processing, which is the loss of speed transferring data to and from the co-processor,” Wenes continues. “You also need Linux-like commands for administering the FPGA-based computer. We’ve developed a set of about twenty commands that manage, monitor, and check the FPGA.”

The XD1 also includes a copious 16MB of 12.8 GB/s Quad-data rate SRAM cache connected to both the Opteron and the FPGA to facilitate maximum utilization of both ends of the processor/co-processor pipe. The cache is memory mapped into the Opteron’s user space so the application running on the AMD processor can access the FPGA’s cache at speed.

Viewed from the top level, the XD1 architecture consists of a backplane/chassis that accommodates 6 blades. Each blade includes two 64-bit AMD Opteron 200 series processors and a single Virtex II Pro XC2VP50-7 with 16MB QDR RAM attached. Each Virtex II Pro can be accessed by any Opteron in the cluster, offering maximum flexibility for algorithm acceleration. The whole thing is coupled to the outside world with big pipes, including 4 PCI-X slots that can take dual-port gigabit ethernet cards or dual-port fibre channel HBA. It’s all running under the control of Cray’s HPC-enhanced version of Linux.

From a hardware perspective, the system has incredible performance potential, which is perhaps limited at this time primarily by the rather Rube-Goldbergian requirements for taking advantage of the huge performance boost possible with FPGA-based acceleration. The soft-focus vision of the future is seamless compilation of high-level algorithmic code into an optimized mix of sequential software and parallelized hardware accelerators. Today, however, the super-advanced tool capability required for that vision is not yet in place. What we have instead is an awesome hardware platform that requires a significant time investment to fully harness. Algorithms must be carefully analyzed by experts and their innermost compute-intensive loops carved out for potential FPGA implementation. These chunks must then be tackled by hardware-savvy engineers who can either create a suitable hardware architecture in RTL or leverage one of the fledgling technologies for converting algorithmic descriptions (such as those written in C or C++) into hardware architectures.

To address these issues, Cray is working with companies like Xilinx to provide development tools to fill the gap and promoting the idea of re-usable IP for common algorithm acceleration. At the same time, they’re also tracking the advances of algorithmic synthesis technologies such as those offered by Starbridge, Celoxica, and Mentor Graphics for compiling software directly into optimized hardware architectures. While algorithm compilation into optimized hardware is the most significant engineering challenge posed by this architecture, the potential gain is so large that several companies are actively developing technology to address the issues.

But, who needs all this computing performance anyway? After all, the computing speed most of us always dreamed about is apparently available in the laptops we carry around to keep up with e-mail. For some applications, however, Moore’s Law’s pace on commodity machines simply isn’t getting the job done. Seismic imaging customers, for example, typically employ vast arrays of Linux boxes flying in formation to seek out subtle patterns in voluminous sensor data. An accelerated high-performance computer (HPC) technology like the XD1 can have a profound impact on the cost of processing the huge amount of data they gather in trying to generate images and models of the sub-surface world. With the FPGA-based acceleration in the XD1, they can obtain 40X-50X improvement over conventional processors for certain algorithms.

In life sciences, biomedical applications like DNA sequence alignment are extremely compute intensive and have algorithms that are well suited to hardware acceleration. As these traditional research areas intersect high-commercial-value domains like drug discovery, a large market may be created for commercial applications of supercomputing. In a more universal sense, even relatively routine tasks like random number generation can be accelerated to great benefit in many simulation and modeling applications.

It is clear that, to date, the potential of FPGAs as reconfigurable computing enablers has barely been touched. High performance hardware implementations like Cray’s XD1 open the door for development of breakthrough synthesis and compilation technology that will make algorithm acceleration a routine and seamless process, much like high-level language compilation is today. If that eventually comes to pass, we may forget what a pure Von Neumann architecture computer even looks like as we enter a new era of performance with programmable logic acceleration as a key component of the computer of the future.

But, what about superconductivity and exotic materials? Do you feel a secret longing for the omniscient aesthetic of Dr. Forbin’s Colossus? If it will help, you can always install a soft-serve machine behind your XD1 to approximate the sounds of a high-powered cooling system. Also, since the XD1 doesn’t have the seven-, eight-, or nine-digit price tag we expected in our fantasy supercomputer, you’ll have plenty of budget left over to adorn yours with some snappy-looking blinking lights and a really nice monitor.

Leave a Reply

featured blogs
Sep 30, 2022
When I wrote my book 'Bebop to the Boolean Boogie,' it was certainly not my intention to lead 6-year-old boys astray....
Sep 30, 2022
Wow, September has flown by. It's already the last Friday of the month, the last day of the month in fact, and so time for a monthly update. Kaufman Award The 2022 Kaufman Award honors Giovanni (Nanni) De Micheli of École Polytechnique Fédérale de Lausanne...
Sep 29, 2022
We explain how silicon photonics uses CMOS manufacturing to create photonic integrated circuits (PICs), solid state LiDAR sensors, integrated lasers, and more. The post What You Need to Know About Silicon Photonics appeared first on From Silicon To Software....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

E-Mobility: Electronic Challenges and Solutions

Sponsored by Mouser Electronics and Würth Elektronik

The future electrification of the world’s transportation industry depends on the infrastructure we create today. In this episode of Chalk Talk, Amelia Dalton chats with Sven Lerche from Würth Elektronik about the electronic challenges and solutions for today’s e-mobility designs and EV charging stations. They take a closer look at the trends in these kinds of designs, the role that electronic parts play in terms of robustness, and how Würth’s REDCUBE can help you with your next electric vehicle or EV charging station design.

Click here for more information about Würth Elektronik Automotive Products