In 1960, Gerald Estrin presented “Organization of computer systems: the fixed plus variable structure computer” at the western joint IRE-AIEE-ACM computer conference. His abstract reads in part: “…a growing number of important problems have been recorded which are not practicably computable by existing systems. These latter problems have provided the incentive for the present development of several large scale digital computers with the goal of one or two orders of magnitude increase in overall computational speed.” – and his solution to the problem is given in the title “the fixed plus variable structure computer” – thus giving birth to the concept of reconfigurable computing.
Yep. The idea of reconfigurable computing pre-dates Moore’s Law, which was born six years later in 1966.
Over the next 58 years, we have chased that reconfigurable computing carrot, dangling enticingly just out of reach from the end of our ever-evolving programming pole. And, for at least three of those six decades, we have had reasonable hardware available to fulfill the promise of reconfigurable computing – an architectural alternative that could deliver Estrin’s “one or two” orders of magnitude increase in overall computational speed. In fact, we now know it might deliver as much as three or four orders of magnitude, and that is on top of our almost-entitled biannual doubling due to Moore’s Law.
And, yet, we are still not there.
One could argue that Moore’s Law has prevented us from realizing the promise of reconfigurable computing. After all, simply riding the von Neumann horse from one semiconductor process node to the next gave us a reliable 2x improvement in price, performance, and power every other year, and we didn’t have to rewrite our software or even redesign our computing architecture to get it. When you get that kind of bounty almost for free, who needs to be greedy?
Reconfigurable computing has always had a die-hard cult-like academic following, however. If the real world can’t deliver on a promising technology, at least careers can be made publishing conference papers about what might have been. And, when modern FPGAs finally came along, researchers could easily build actual hardware to prove their point. Time after time, industry and venture capital were pulled into the fray, betting that the pairing of programmable logic with conventional processors would add those much-anticipated zeroes to our performance benchmarks. Time after time, they failed – not because we didn’t know how to build the hardware for reconfigurable computers (we did), but because nobody could program the things.
Of course, we could build the “Demo.”
For years, we’ve seen example after example of systems delivering amazing results with specialized algorithms accelerated to remarkable levels via painstakingly crafted custom FPGA accelerators parked next to conventional processors. If you had a team of expert digital designers, an unlimited budget, and a year or so to spare, you could deliver 100x better performance and power than conventional computers on your performance-critical application.
Or, you could just run it on 100 parallel Intel servers on day one – and get close enough, with far less cost, risk, and development time. And, as long as Intel was tracking Moore’s Law, doubling all our goodies every couple years, there was little incentive to dive into the terrifying world of RTL-based design to go even faster. The sustained systemic exponential improvement of Moore’s Law acted as a kind of sedative for development of the programming methodology that would unlock the potential of reconfigurable computing.
Of course, Intel didn’t continue tracking Moore’s Law forever. Over the past decade, we’ve seen significant slowing in the venerable trend, with less gain on each process node, and longer (and more expensive) development cycles between them. On top of that, the von Neumann architecture itself has started to hit practical limits in the form of power consumption. Faster clocking gave way to more parallelism, and, ultimately, even the practice of stacking racks and racks of servers into giant data centers hit a wall when the power company simply couldn’t deliver any more power.
Intel itself now needs reconfigurable computing, and, of course, they’re working on it. Hard.
With the acquisition of Altera, Intel is now in a position to deliver the required hardware – processors with FPGA accelerators – into the data centers of the world. Intel’s dominance of the data center gives it a substantial leg up on the competition when it comes to widespread deployment of a disruptive technology like reconfigurable computing. But as long as the programming problem remains unsolved, that dominance is vulnerable.
Intel inched closer to that goal recently, with the deployment of their first SDK that merges Intel’s existing software development frameworks and compiler technology with the OpenCL capabilities in Altera’s Quartus Prime FPGA development tools to smooth the path for OpenCL developers wanting to take advantage of FPGA acceleration. The ultimate objective here is to abstract away details of the FPGA implementation so that software developers can write something closer to conventional code and still take advantage of FPGA acceleration, without having to have hardware engineers on the team who are FPGA/RTL experts.
The FPGA SDK adds FPGA support to both Microsoft Visual Studio and Eclipse-based Intel Code Builder for the OpenCL API. This gives OpenCL developers a familiar environment for their FPGA exploits. To address the problem of hours-long code-compile-test cycles on FPGAs, Intel is providing what it calls “Fast FPGA emulation” using Intel’s compilers to emulate the functionality of the FPGA implementation in software for more conventional software-like debug cycle iterations.
Because OpenCL itself is also a bit of an evangelical sale, Intel has included a smorgasbord of features designed to reduce fear, uncertainty, and doubt in the software development crowd. An OpenCL jump-start wizard helps programmers to overcome “blank page” syndrome, and features like syntax highlighting and code auto-completion make it seem like regular-old software development. To help with the FPGA-isms, there is “what-if” kernel performance analysis and quick static FPGA resource and performance analysis. And, when it comes time to push the design into actual hardware, there is support for fast and incremental FPGA compile to reduce the pain of those final (hopefully) few design iterations.
It will be interesting to watch the adoption and evolution of the FPGA SDK for OpenCL. At first, it is likely to be a “power tool” for teams who are already sold on the benefits and aware of the costs and risks of FPGA acceleration. While this is not the magic bullet that will break the dam and allow mainstream software to flow into the long-waiting arms of reconfigurable computing envisioned by Gerald Estrin 58 years ago, it does represent a significant step forward in allowing software engineers with no hardware training to begin to take advantage of some of that promise. For now, tapping into the real potential of FPGA-based acceleration will still probably require the expertise of FPGA designers and the development time of RTL-based implementation. But any progress toward bridging that development gap could be significant.