The electronic design automation (EDA’s) mission has always been primarily to facilitate the design and verification of electronic circuits. EDA began, of course, with companies like Mentor, Daisy, and Valid providing specialized software for capturing and editing schematic drawings. These tools took the native human-readable language of the designer: schematics, and created the fundamental machine-readable structure of EDA: the netlist.
In the four decades since, EDA has not strayed far from that path, conceptually. The job just got tougher. Moore’s Law took complexity through the roof, for logic design in particular. With designs going from handfuls of gates to billions, the human-readable side of that equation evolved. Schematics gave way to gate-level and then to register-level hardware description languages (HDLs), and EDA responded with an entirely new class of tools: logic synthesis.
A generation of digital designers became experts in HDLs and synthesis. At the same time, the explosion of the synthesis market rocketed Synopsys into the number one position in the EDA industry on the coattails of Design Compiler, a logic synthesis tool that dominated the industry for decades. The foresight (or luck, depending on your view) that Synopsys showed in grabbing onto a major methodology shift won the company a seat at the head of the table that they’ve enjoyed for over twenty years.
Now, another disruptive change is hitting the industry. Artificial intelligence has progressed more in the past four years than in all of its previous long history, fueled primarily by disruptive progress in convolutional neural networks (CNNs). CNNs present a unique challenge to hardware design, as the dominant architecture of the past twenty years – the so-called “system on chip” SoC – is woefully inadequate for meeting the computational demands of CNNs. That means designers need to come up with new and novel digital architectures to implement CNNs efficiently in hardware.
More specifically, executing a CNN model in software on a von Neumann machine is woefully inefficient. In order to meet the demands of applications such as machine vision, we need to make orders of magnitude improvement in latency, throughput, power consumption, and cost. Or, to put it another way, we need custom hardware designed specifically for the task.
Unfortunately, every CNN model is unique in its topology. So far, at least, there is no one-graph-fits-all approach to CNN design. That means that every model or algorithm requires a unique structure of logic, memory, and data flow to optimize it across the key metrics. So, we’ve got software-only implementations that are inadequate – running on multi-core CPUs, GPUs, and some specialized processors, and we’ve got hardware implementations that require unique and specialized logic design on a per-application basis.
It’s time for a new tool/flow.
The demands of CNN implementation are a significant departure from the direction that logic system design flow has taken up until now. Our current tool flow has evolved to ever-higher levels of abstraction. We began with simple logic gates and evolved to ever-larger structures stitched together to create our desired function. Today, most systems are created by combining processors, peripherals, and specialized blocks/accelerators to meet the requirements of our system/application.
In that context, we could view CNNs as just another one of those “specialized blocks/accelerators.” We already have design tools and flows for those. High-level synthesis, for example, is adept at taking a software-like description of an algorithm in C or C++ and synthesizing that into a highly-optimized logic structure (usually a datapath with control, memory, and interfaces). And some approaches to CNN design are taking advantage of HLS today.
There are three major problems with this approach, however. First, it appears that the optimal implementation of CNNs will often be heterogeneous combinations of conventional processors with custom hardware. That means that there is a partitioning of functionality between software and hardware that is well beyond the scope of current HLS technology. Second, HLS is a very general tool for generating hardware architectures from software-like sequential algorithms, but CNN architectures tend to be much more structured and predictable. Throwing some code at HLS and hoping that it magically creates an optimized CNN is quite a roll of the dice. Finally, the very small number of folks who currently know how to develop CNNs don’t tend to have the expertise in hardware design required to use the latest HLS design flows.
So, there’s a disconnect between the state of the art in logic system design and the data science experts who design CNNs. This gap must be bridged with an automated tool flow that can understand the native language of CNN experts and can drive a process that results in customized CNN hardware. Sounds simple, right?
It’s not like nobody has thought of this problem. There are currently a number of tool flows that cobble together pieces of the puzzle in a Rube-Goldbergian fashion, with varying degrees of success. Just in the past couple of years, a number of (mostly academic) efforts have produced notable results. Most of these start with one of the current CNN modeling frameworks, such as Caffe or TensorFlow as input, and produce some kind of synthesizable RTL as output. These flows include fpgaConvNet, ALAMO, Angel-Eye, DeepBurning, Haddoc2, Caffeine, Finn, FP-DNN Snowflake, FFTCodeGen, and perhaps others we’ve overlooked.
Most of these tool flows target FPGA hardware, although for some applications we might want ASIC implementations instead. Some are specific to Xilinx or to Intel FPGA flows, while others make efforts to produce portable results. It is possible that the optimal implementation in many situations might take advantage of new eFPGA IP blocks in an ASIC (such as those provided by Achronix, Flex-Logix, and others) producing a custom chip whose CNN model can be reprogrammed or optimized in the field.
The EDA industry dominates most of the underlying technology required to solve this problem, and it is a problem that will be front-and-center for at least the next couple of decades, with very high percentages of new system designs trying to take advantage of the capabilities of CNNs. The infrastructure that EDA already owns for implementing and verifying logic hardware, from high levels of abstraction down to optimized, verified, placed-and-routed gates, is essential to the solution, yet we see no evidence that EDA is tackling the top level of the problem. Instead, there is a plethora of competing academic efforts underway trying to build clumsy structures around existing EDA (and FPGA vendor) tool flows.
Perhaps EDA is already working on this problem in secret. Or, perhaps there are startups quietly toiling away in hopes of becoming the “Synopsys” of the next era in electronic system design. Or, maybe we’re just too early in the evolution of this technology to start canonizing it with purpose-built tools. Our guess is that EDA has just overlooked the opportunity or doesn’t know where to start. One thing is certain, though. This problem is too important to be ignored for long. It will be interesting to watch.