feature article
Subscribe Now

Mipsology Brings “Zero Effort” Inference

FPGA-based AI Acceleration Made Easy

Like the proverbial carrot-on-a-stick, FPGA-based acceleration has been right in front of our noses, just out of reach, for the better part of three decades. We move closer, and the prize moves farther away. Every few years, we feel some tangible progress, and perhaps cut the distance in half, but asymptotes can be unfriendly bedfellows. The old “reconfigurable computing” vision of FPGAs as replacements for CPUs has teased, taunted, and ultimately disappointed us. 

The fact that FPGAs are basically vast arrays of unconnected logic gives the notion that we can have optimal hardware custom designed for our algorithm, where the appropriate hardware resources can be applied to generate maximum parallelism and efficiency without all those pesky program counters and instructions. They conjure visions of data flowing smoothly into one side of our machine and results flowing out the other with absolute minimal friction.

All we have to do is get our software into one.

Ah, and there’s the rub. The overwhelming challenge of FPGA acceleration has always been the programming model. And, try as we may, every approach that has been tried (and there have been many) has failed to come anywhere near what can be achieved with a conventional von Neumann CPU. 

At first, we simply thought we should re-train the world’s software engineers to use “different languages” and adopt hardware-description languages like Verilog to develop their code. After writing thousands of lines of almost-incomprehensible nonsense in VHDL or Verilog to do what would have required a few dozen lines of simple C, software engineers told us that plan was, uh, sub-optimal, and to please not ever call them back again.

With some success, we created high-level synthesis tools that can convert sequential software-like descriptions written in software-like languages such as C and C++ into parallel datapath machines that can then be synthesized and placed-and-routed, but that is not a software development flow. That is a “hardware design flow” and the amount of time and effort required for each iteration of that process still puts it far behind modern software development and debug environments. While HLS can give us FPGA-accelerated designs with a fraction of the effort required by conventional HDL design, it still falls well short of what we can do to put the same algorithm on a regular processor.

But we are at the dawn of a new era. Who needs a programming model if the thing we are accelerating is not even software? (Well, not exactly, anyway). For a range of applications, the AI revolution has done away with the task of programming. Instead, we just feed our machines massive amounts of data, and they program themselves. And, the value proposition for AI is so incredibly strong that we are determined to do whatever is required – even if it is somewhat less convenient than writing conventional software. We don’t need 5-second iterations from code changes to results, no fancy debuggers to see all the inner workings of our CNN… Heck, we don’t even KNOW the inner workings of our CNN.

Mipsology, a French startup, recognized this opportunity and came up with an elegant strategy to capitalize on it. The Mipsology team has mad skills and an enormous amount of experience doing computation with FPGAs in the emulation space, where several of them worked on ZeBu, an innovative FPGA prototyping/emulation platform developed by EVE and later acquired by Synopsys. Calling on their FPGA mapping experience from emulation, the Mipsology team developed Zebra, a software stack that takes any neural network from any of the popular frameworks – Caffe, Caffe2, MXNET, TensorFlow, etc. – and creates an efficient FPGA accelerator model that can be deployed on a wide range of FPGA-based platforms from Amazon AWS F1 FPGA instances to a huge range of FPGA accelerator boards. 

Partnering with companies like Xilinx, Avnet, Mellanox, Advantech, Western Digital, and Tul, Zebra can take any trained neural network from the popular frameworks and create an FPGA-based accelerator on a wide range of FPGA platforms, with zero changes or hardware expertise required from the AI engineer. Mipsology says Zebra is essentially pushbutton – network model in, FPGA implementation out. This allows AI engineers to target inference deployment in anything from data centers to edge devices to desktop applications. It’s like just having a big “ACCELERATE” button. Mipsology claims “Zebra users don’t have to learn new languages, new frameworks, or new tools. Not a single line of code must be changed in the application.”

Beyond not having to change code, though, Mipsology has solved another key problem. The FPGA does not have to be reconfigured (and therefore no synthesis, place-and-route, or timing closure required) in order to load an updated model. Performance wise, Mipsology claims to be able to achieve more than 5,000 images per second on ResNet 50, and more than 2,500 images per second on Inception-V3, and 250 fps on YoloV3 – on Xilinx’s Alveo U250. That’s some serious throughput, and well beyond what the leading GPUs can accomplish. There is also the option to dial in more performance by scaling back the resolution at which the hardware processes your model.

Recently, Mipsology and Xilinx announced that Zebra software/IP has been integrated into the latest build of Xilinx’s Alveo U50 data center accelerator card, making the jump to acceleration for Alveo users that much easier. The company bills this as “Zero Effort IP” – and who doesn’t need more zero effort solutions in their life?

The AI inference world is certainly exciting right now, crowded with a plethora of solutions including new chip architectures, accelerator cards, development tools… the list goes on and on. It will be interesting to see how Mipsology’s “zero effort,” hardware-agnostic, framework-independent, software stack approach fares as the market begins to shake out. The team certainly seems to be checking a lot of the key boxes – with ease-of-use, platform portability, performance, and power efficiency.

4 thoughts on “Mipsology Brings “Zero Effort” Inference”

  1. Ah, and there’s the rub. The overwhelming challenge of FPGA acceleration has always been the programming model. And, try as we may, every approach that has been tried (and there have been many) has failed to come anywhere near what can be achieved with a conventional von Neumann CPU.

    Do you mean that FPGA accelerators do not exist(or that they do not accelerate algorithms)? Do you mean that von Neumann CPU is the fastest thing known to man?

    Then came the GPU for graphic algorithms. Of course it was immediately obvious that was not enough and there was an immediate effort to extend to general purpose computing algorithms … Sometimes just let sleeping dogs lie.

    AI inferencing algorithms are not for general purpose computing and let us hope that AI is big enough to overcome the notion that the von Neumann CPU must be used for AI.

  2. @Karl – FPGA accelerators most certainly do exist, and deliver performance and energy efficiency orders of magnitude better than von Neumann CPUs. But, their adoption has always been severely limited by how difficult they are to program.

    1. @Kevin, thanks for your reply. One of my frustrations is the so-called “Tool Chain” which is more like a ball and chain. It is absurd to start with a Hardware Description Language for design entry. First there must be logic design, and that means the logical combinations of inputs and storage/state elements. Each must have a name that is used in and/or/not expressions. It is obvious that things that appear in expressions are inputs used to determine the true/false value of the output. Sure outputs must have names because they are inputs to other expressions. Lists of inputs and outputs are not necessary.
      Sensitivity lists, Always blocks, processes, blocking/non-blocking assignments only have meaning to synthesis, not the logic. Sorry, I needed to let off a little steam.
      HLS/SystemC or whatever you want to call it, boils down to evaluating numeric expressions but after many years it still has not fully “matured” and is not universally accepted. (Let’s skip the fact that there are 2 HDLs and that new C++ Classes may have to be designed)
      By now you have had enough of this … but there is hope. There is a compiler that will take an expression and identify the sequence of operators and operands to evaluate expressions/algorithms. This can be implemented using a few hundred LUTs and 3 embedded memory blocks on an FPGA. Here is the sweet part — it is programmable by simply loading the memories, not by re-designing the FPGA.

    2. I just saw an article about Altera OpenCL that was going to take care of all this.
      Have you checked lately? At the time it seemed that the GPU was handling the graphics fine, but a group of know it alls decided they could do better…

Leave a Reply

featured blogs
Oct 30, 2020
[From the last episode: We saw that converters are needed around an analog memory to convert between digital and analog parts of the circuit.] We'€™ve seen that we can modify a digital memory a number of ways to make it do math for us. Those modifications include: Using the...
Oct 30, 2020
I like to do the (London) Times crossword most days. For more information on how cryptic crosswords even work, see my offtopic post Aren't All Crosswords Cryptic? There's also a blog where... [[ Click on the title to access the full blog on the Cadence Community si...
Oct 29, 2020
Autumn is shaping up to be a popular time for digital trade shows this year, and OCP Tech Week will be occurring November 9 – 13, 2020. OCP Tech Week 2020 will provide Engineering Workshops, live lectures, and interactive collaboration sessions. During this digital trad...
Oct 28, 2020
You rarely get to hear people of this caliber talk in this '€œfireside chat'€ manner, so I would advise younger engineers to take the time to listen to these industry luminaries....

featured video

Better PPA with Innovus Mixed Placer Technology – Gigaplace XL

Sponsored by Cadence Design Systems

With the increase of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan with manual methods. Innovus Implementation’s advanced multi-objective placement technology, GigaPlace XL, provides automation to optimize at scale, concurrent placement of macros, and standard cells for multiple objectives like timing, wirelength, congestion, and power. This technology provides an innovative way to address design productivity along with design quality improvements reducing weeks of manual floorplan time down to a few hours.

Click here for more information about Innovus Implementation System

featured Paper

New package technology improves EMI and thermal performance with smaller solution size

Sponsored by Texas Instruments

Power supply designers have a new tool in their effort to achieve balance between efficiency, size, and thermal performance with DC/DC power modules. The Enhanced HotRod™ QFN package technology from Texas Instruments enables engineers to address design challenges with an easy-to-use footprint that resembles a standard QFN. This new package type combines the advantages of flip-chip-on-lead with the improved thermal performance presented by a large thermal die attach pad (DAP).

Click here to download the whitepaper

Featured Chalk Talk

Wide Band Gap: Silicon Carbide

Sponsored by Mouser Electronics and ON Semiconductor

Wide bandgap materials such as silicon carbide are revolutionizing the power industry. From electric vehicles and charging stations to solar power to industrial power supplies, wide bandgap brings efficiency, improved thermal performance, size reduction, and more. In this episode of Chalk Talk, Amelia Dalton chats with Brandon Becker from ON Semiconductor about the advantages of silicon carbide diodes and MOSFETs.

Click here for more information about ON Semiconductor Wide Bandgap SiC Devices