feature article
Subscribe Now

Mipsology Brings “Zero Effort” Inference

FPGA-based AI Acceleration Made Easy

Like the proverbial carrot-on-a-stick, FPGA-based acceleration has been right in front of our noses, just out of reach, for the better part of three decades. We move closer, and the prize moves farther away. Every few years, we feel some tangible progress, and perhaps cut the distance in half, but asymptotes can be unfriendly bedfellows. The old “reconfigurable computing” vision of FPGAs as replacements for CPUs has teased, taunted, and ultimately disappointed us. 

The fact that FPGAs are basically vast arrays of unconnected logic gives the notion that we can have optimal hardware custom designed for our algorithm, where the appropriate hardware resources can be applied to generate maximum parallelism and efficiency without all those pesky program counters and instructions. They conjure visions of data flowing smoothly into one side of our machine and results flowing out the other with absolute minimal friction.

All we have to do is get our software into one.

Ah, and there’s the rub. The overwhelming challenge of FPGA acceleration has always been the programming model. And, try as we may, every approach that has been tried (and there have been many) has failed to come anywhere near what can be achieved with a conventional von Neumann CPU. 

At first, we simply thought we should re-train the world’s software engineers to use “different languages” and adopt hardware-description languages like Verilog to develop their code. After writing thousands of lines of almost-incomprehensible nonsense in VHDL or Verilog to do what would have required a few dozen lines of simple C, software engineers told us that plan was, uh, sub-optimal, and to please not ever call them back again.

With some success, we created high-level synthesis tools that can convert sequential software-like descriptions written in software-like languages such as C and C++ into parallel datapath machines that can then be synthesized and placed-and-routed, but that is not a software development flow. That is a “hardware design flow” and the amount of time and effort required for each iteration of that process still puts it far behind modern software development and debug environments. While HLS can give us FPGA-accelerated designs with a fraction of the effort required by conventional HDL design, it still falls well short of what we can do to put the same algorithm on a regular processor.

But we are at the dawn of a new era. Who needs a programming model if the thing we are accelerating is not even software? (Well, not exactly, anyway). For a range of applications, the AI revolution has done away with the task of programming. Instead, we just feed our machines massive amounts of data, and they program themselves. And, the value proposition for AI is so incredibly strong that we are determined to do whatever is required – even if it is somewhat less convenient than writing conventional software. We don’t need 5-second iterations from code changes to results, no fancy debuggers to see all the inner workings of our CNN… Heck, we don’t even KNOW the inner workings of our CNN.

Mipsology, a French startup, recognized this opportunity and came up with an elegant strategy to capitalize on it. The Mipsology team has mad skills and an enormous amount of experience doing computation with FPGAs in the emulation space, where several of them worked on ZeBu, an innovative FPGA prototyping/emulation platform developed by EVE and later acquired by Synopsys. Calling on their FPGA mapping experience from emulation, the Mipsology team developed Zebra, a software stack that takes any neural network from any of the popular frameworks – Caffe, Caffe2, MXNET, TensorFlow, etc. – and creates an efficient FPGA accelerator model that can be deployed on a wide range of FPGA-based platforms from Amazon AWS F1 FPGA instances to a huge range of FPGA accelerator boards. 

Partnering with companies like Xilinx, Avnet, Mellanox, Advantech, Western Digital, and Tul, Zebra can take any trained neural network from the popular frameworks and create an FPGA-based accelerator on a wide range of FPGA platforms, with zero changes or hardware expertise required from the AI engineer. Mipsology says Zebra is essentially pushbutton – network model in, FPGA implementation out. This allows AI engineers to target inference deployment in anything from data centers to edge devices to desktop applications. It’s like just having a big “ACCELERATE” button. Mipsology claims “Zebra users don’t have to learn new languages, new frameworks, or new tools. Not a single line of code must be changed in the application.”

Beyond not having to change code, though, Mipsology has solved another key problem. The FPGA does not have to be reconfigured (and therefore no synthesis, place-and-route, or timing closure required) in order to load an updated model. Performance wise, Mipsology claims to be able to achieve more than 5,000 images per second on ResNet 50, and more than 2,500 images per second on Inception-V3, and 250 fps on YoloV3 – on Xilinx’s Alveo U250. That’s some serious throughput, and well beyond what the leading GPUs can accomplish. There is also the option to dial in more performance by scaling back the resolution at which the hardware processes your model.

Recently, Mipsology and Xilinx announced that Zebra software/IP has been integrated into the latest build of Xilinx’s Alveo U50 data center accelerator card, making the jump to acceleration for Alveo users that much easier. The company bills this as “Zero Effort IP” – and who doesn’t need more zero effort solutions in their life?

The AI inference world is certainly exciting right now, crowded with a plethora of solutions including new chip architectures, accelerator cards, development tools… the list goes on and on. It will be interesting to see how Mipsology’s “zero effort,” hardware-agnostic, framework-independent, software stack approach fares as the market begins to shake out. The team certainly seems to be checking a lot of the key boxes – with ease-of-use, platform portability, performance, and power efficiency.

4 thoughts on “Mipsology Brings “Zero Effort” Inference”

  1. Ah, and there’s the rub. The overwhelming challenge of FPGA acceleration has always been the programming model. And, try as we may, every approach that has been tried (and there have been many) has failed to come anywhere near what can be achieved with a conventional von Neumann CPU.

    Do you mean that FPGA accelerators do not exist(or that they do not accelerate algorithms)? Do you mean that von Neumann CPU is the fastest thing known to man?

    Then came the GPU for graphic algorithms. Of course it was immediately obvious that was not enough and there was an immediate effort to extend to general purpose computing algorithms … Sometimes just let sleeping dogs lie.

    AI inferencing algorithms are not for general purpose computing and let us hope that AI is big enough to overcome the notion that the von Neumann CPU must be used for AI.

  2. @Karl – FPGA accelerators most certainly do exist, and deliver performance and energy efficiency orders of magnitude better than von Neumann CPUs. But, their adoption has always been severely limited by how difficult they are to program.

    1. @Kevin, thanks for your reply. One of my frustrations is the so-called “Tool Chain” which is more like a ball and chain. It is absurd to start with a Hardware Description Language for design entry. First there must be logic design, and that means the logical combinations of inputs and storage/state elements. Each must have a name that is used in and/or/not expressions. It is obvious that things that appear in expressions are inputs used to determine the true/false value of the output. Sure outputs must have names because they are inputs to other expressions. Lists of inputs and outputs are not necessary.
      Sensitivity lists, Always blocks, processes, blocking/non-blocking assignments only have meaning to synthesis, not the logic. Sorry, I needed to let off a little steam.
      HLS/SystemC or whatever you want to call it, boils down to evaluating numeric expressions but after many years it still has not fully “matured” and is not universally accepted. (Let’s skip the fact that there are 2 HDLs and that new C++ Classes may have to be designed)
      By now you have had enough of this … but there is hope. There is a compiler that will take an expression and identify the sequence of operators and operands to evaluate expressions/algorithms. This can be implemented using a few hundred LUTs and 3 embedded memory blocks on an FPGA. Here is the sweet part — it is programmable by simply loading the memories, not by re-designing the FPGA.

    2. I just saw an article about Altera OpenCL that was going to take care of all this.
      Have you checked lately? At the time it seemed that the GPU was handling the graphics fine, but a group of know it alls decided they could do better…

Leave a Reply

featured blogs
Apr 22, 2021
President George H. W. Bush famously said that he didn't do "the vision thing". Well, here at Cadence we definitely do the vision thing. In fact, the Tensilica Vision DSP product line... [[ Click on the title to access the full blog on the Cadence Community si...
Apr 21, 2021
Robotics are everywhere, or so it seems. Robot combat, STEM outreach, robotic arms and advanced surgical devices are but a few applications that touch our everyday life. As a student, I was the President of the Santa Barbara City College Robotics Club and a FIRST Robotics com...
Apr 21, 2021
Introducing PrimeLib, an SoC design tool that maps the latest chip technologies & enables correct-by-construction design for SoCs at advanced process nodes. The post Advanced Nodes Drive Demand for Advanced Library Characterization and Validation Solutions appeared firs...
Apr 20, 2021
By Sherif Hany and Abdellah Bakhali Regardless of which technology node they're using, design houses… The post Give me my space! Why high voltage and multiple power domain designs need automated context-aware spacing checks appeared first on Design with Calibre....

featured video

The Verification World We Know is About to be Revolutionized

Sponsored by Cadence Design Systems

Designs and software are growing in complexity. With verification, you need the right tool at the right time. Cadence® Palladium® Z2 emulation and Protium™ X2 prototyping dynamic duo address challenges of advanced applications from mobile to consumer and hyperscale computing. With a seamlessly integrated flow, unified debug, common interfaces, and testbench content across the systems, the dynamic duo offers rapid design migration and testing from emulation to prototyping. See them in action.

Click here for more information

featured paper

Understanding Functional Safety FIT Base Failure Rate Estimates per IEC 62380 and SN 29500

Sponsored by Texas Instruments

Functional safety standards such as IEC 61508 and ISO 26262 require semiconductor device manufacturers to address both systematic and random hardware failures. Base failure rates (BFR) quantify the intrinsic reliability of the semiconductor component while operating under normal environmental conditions. Download our white paper which focuses on two widely accepted techniques to estimate the BFR for semiconductor components; estimates per IEC Technical Report 62380 and SN 29500 respectively.

Click here to download the whitepaper

Featured Chalk Talk

Accelerate the Integration of Power Conversion with microBUCK® and microBRICK™

Sponsored by Mouser Electronics and Vishay

In the world of power conversion, multi-chip packaging, thermal performance, and power density can make all of the difference in the success of your next design. In this episode of Chalk Talk, Amelia Dalton chats with Raymond Jiang about the trends and challenges in power delivery and how you can leverage the unique combination of discrete MOSFET design, IC expertise, and packaging capability of Vishay’s microBRICK™and microBUCK® integrated voltage regulators.

Click here for more information about Vishay microBUCK® and microBRICK™ DC/DC Regulators