Like the proverbial carrot-on-a-stick, FPGA-based acceleration has been right in front of our noses, just out of reach, for the better part of three decades. We move closer, and the prize moves farther away. Every few years, we feel some tangible progress, and perhaps cut the distance in half, but asymptotes can be unfriendly bedfellows. The old “reconfigurable computing” vision of FPGAs as replacements for CPUs has teased, taunted, and ultimately disappointed us.
The fact that FPGAs are basically vast arrays of unconnected logic gives the notion that we can have optimal hardware custom designed for our algorithm, where the appropriate hardware resources can be applied to generate maximum parallelism and efficiency without all those pesky program counters and instructions. They conjure visions of data flowing smoothly into one side of our machine and results flowing out the other with absolute minimal friction.
All we have to do is get our software into one.
Ah, and there’s the rub. The overwhelming challenge of FPGA acceleration has always been the programming model. And, try as we may, every approach that has been tried (and there have been many) has failed to come anywhere near what can be achieved with a conventional von Neumann CPU.
At first, we simply thought we should re-train the world’s software engineers to use “different languages” and adopt hardware-description languages like Verilog to develop their code. After writing thousands of lines of almost-incomprehensible nonsense in VHDL or Verilog to do what would have required a few dozen lines of simple C, software engineers told us that plan was, uh, sub-optimal, and to please not ever call them back again.
With some success, we created high-level synthesis tools that can convert sequential software-like descriptions written in software-like languages such as C and C++ into parallel datapath machines that can then be synthesized and placed-and-routed, but that is not a software development flow. That is a “hardware design flow” and the amount of time and effort required for each iteration of that process still puts it far behind modern software development and debug environments. While HLS can give us FPGA-accelerated designs with a fraction of the effort required by conventional HDL design, it still falls well short of what we can do to put the same algorithm on a regular processor.
But we are at the dawn of a new era. Who needs a programming model if the thing we are accelerating is not even software? (Well, not exactly, anyway). For a range of applications, the AI revolution has done away with the task of programming. Instead, we just feed our machines massive amounts of data, and they program themselves. And, the value proposition for AI is so incredibly strong that we are determined to do whatever is required – even if it is somewhat less convenient than writing conventional software. We don’t need 5-second iterations from code changes to results, no fancy debuggers to see all the inner workings of our CNN… Heck, we don’t even KNOW the inner workings of our CNN.
Mipsology, a French startup, recognized this opportunity and came up with an elegant strategy to capitalize on it. The Mipsology team has mad skills and an enormous amount of experience doing computation with FPGAs in the emulation space, where several of them worked on ZeBu, an innovative FPGA prototyping/emulation platform developed by EVE and later acquired by Synopsys. Calling on their FPGA mapping experience from emulation, the Mipsology team developed Zebra, a software stack that takes any neural network from any of the popular frameworks – Caffe, Caffe2, MXNET, TensorFlow, etc. – and creates an efficient FPGA accelerator model that can be deployed on a wide range of FPGA-based platforms from Amazon AWS F1 FPGA instances to a huge range of FPGA accelerator boards.
Partnering with companies like Xilinx, Avnet, Mellanox, Advantech, Western Digital, and Tul, Zebra can take any trained neural network from the popular frameworks and create an FPGA-based accelerator on a wide range of FPGA platforms, with zero changes or hardware expertise required from the AI engineer. Mipsology says Zebra is essentially pushbutton – network model in, FPGA implementation out. This allows AI engineers to target inference deployment in anything from data centers to edge devices to desktop applications. It’s like just having a big “ACCELERATE” button. Mipsology claims “Zebra users don’t have to learn new languages, new frameworks, or new tools. Not a single line of code must be changed in the application.”
Beyond not having to change code, though, Mipsology has solved another key problem. The FPGA does not have to be reconfigured (and therefore no synthesis, place-and-route, or timing closure required) in order to load an updated model. Performance wise, Mipsology claims to be able to achieve more than 5,000 images per second on ResNet 50, and more than 2,500 images per second on Inception-V3, and 250 fps on YoloV3 – on Xilinx’s Alveo U250. That’s some serious throughput, and well beyond what the leading GPUs can accomplish. There is also the option to dial in more performance by scaling back the resolution at which the hardware processes your model.
Recently, Mipsology and Xilinx announced that Zebra software/IP has been integrated into the latest build of Xilinx’s Alveo U50 data center accelerator card, making the jump to acceleration for Alveo users that much easier. The company bills this as “Zero Effort IP” – and who doesn’t need more zero effort solutions in their life?
The AI inference world is certainly exciting right now, crowded with a plethora of solutions including new chip architectures, accelerator cards, development tools… the list goes on and on. It will be interesting to see how Mipsology’s “zero effort,” hardware-agnostic, framework-independent, software stack approach fares as the market begins to shake out. The team certainly seems to be checking a lot of the key boxes – with ease-of-use, platform portability, performance, and power efficiency.