According to tech folklore, Carver Mead actually coined the term “Moore’s Law” – some ten years or so after the publication of Gordon Moore’s landmark 1965 Electronics Magazine article “Cramming More Components Onto Integrated Circuits.” For the next five and a half decades, the world was reshaped by the self-fulfilling prophecy outlined in that article. Namely, that every two years or so, semiconductor companies would be able to double the number of transistors that could be fabricated on a single semiconductor chip.
That biennial doubling of transistors led most noticeably to an even faster exponential increase in computation power. Besides getting more transistors from Moore’s Law, we got faster, cheaper, and more power-efficient ones. All of those factors together enabled us to build faster, more complex, higher-performance computing devices. In 1974, Robert Dennard observed that power efficiency in computing would increase even faster than transistor count, because of the triple-exponential improvement in density, speed, and energy efficiency as process geometry scaled downward. This trend, known as “Dennard Scaling” stayed with us for around three decades, and compute performance (and more importantly, power, it turns out) rode an unprecedented exponential improvement rocket.
All of this compute power improvement was built on top of the Von Neumann processor architecture developed in 1945 by John Von Neumann (and others), documented in an unfinished report called “First Draft of a Report on EDIVAC.” So, ironically, the most impressive technological revolution in history was built on a half-century-old design from an unfinished paper. With all the remarkable advances in digital computation during the Moore’s Law era, the now 75-year old fundamental compute architecture has remained largely unchanged.
Is the Von Neumann architecture simply the best possible way to do computation? Of course not. To mis-paraphrase Winston Churchill, Von Neumann is “the worst possible compute architecture – except for all the others.” On a more serious note, the good thing about Von Neumann is its flexibility and area efficiency. It can handle just about any arbitrarily complex application without requiring the processor to scale in transistor count with the size of the problem.
In the old days, before we could cram so many components onto integrated circuits, that architectural efficiency of Von Neumann was a big deal. We could build a 4-, 8-, or 16-bit Von Neumann processor out of a very small number of transistors and run giant programs at acceptable speed. But now, in the wake of Moore’s Law, transistors are asymptotically approaching zero cost. So, with almost infinite numbers of free transistors available, the value of building processors with comparatively small numbers of transistors has dropped significantly.
At the same time, even with Moore’s Law going full steam, the value extracted from each subsequent process node has decreased. Dennard Scaling came to an end around 2005, forcing us to switch from building bigger/faster Von Neumann processors to building “more” Von Neumann processors. The race became cramming more cores onto integrated circuits, and the scalability of Von Neumann to multi-core brought its own limitations.
Unfortunately, Moore’s Law didn’t continue going full steam. Each of the last several process nodes has cost exponentially more to realize and has yielded proportionally less in tangible benefits. The result is that, even though we should technically be able to build more dense chips for several more generations, the cost/benefit ratio of doing so makes it a less and less attractive enterprise. We now need a driver other than Moore’s Law to maintain the pace of technological progress.
Clearly, we are also reaching the end of the useful life of Von Neumann as the single, do-all compute architecture. The recent AI revolution has accelerated the development of alternatives to Von Neumann. AI, particularly done with convolutional neural networks, is an unbelievably compute-intensive problem that is uniquely unsuited to Von Neumann. Already, an industry-wide trend is afoot, shifting us away from large arrays of homogenous compute elements to complex configurations of heterogeneous elements including Von Neumann and non-Von Neumann approaches.
One of the more promising non-Von Neumann approaches to AI is the Neuromorphic architecture. In the late 1980s, Carver Mead (yup, the same guy who supposedly coined the term “Moore’s Law”) observed that, on the current trajectory, Von Neumann processors would use millions of times more energy than the human brain uses for the same computation. He theorized that more efficient computational circuits could be built by emulating the neuron structure of the human brain. Mead made an analogy of neuron ion-flow with transistor current, and proposed what came to be known as Neuromorphic computing based on that idea.
At the time, Neuromorphic computing was visualized as an analog affair, with neurons triggering one another with continuously varying voltages or currents. But the world was firm on the path of optimizing the binary universe of digital design. Analog circuits were not scaling at anything like the digital exponential, so the evolution of Neuromorphic computing was outside the mainstream trajectory of Moore’s Law.
Now, however, things have changed.
In the longer term, we have seen most analog functions subsumed by digital approximations, and neuromorphic processors have been implemented with what are called “spiking neural networks” (SNNs), which rely on single-bit spikes from each neuron to activate neurons down the chain. These networks are completely asynchronous, and, rather than sending values, the activation depends on the timing of the spikes. Using this technique, neuromorphic processors have been implemented, taking advantage of current leading-edge bulk CMOS digital technology. This means neuromorphic architectures can finally reap the rewards of Moore’s Law. As a result, several practical neuromorphic processors have been built and tested, and the results are impressive and encouraging.
One example we wrote about two years ago is Brainchip’s Akida neuromorphic processor, for which development boards became available in December, 2020. Brainchip claims their devices use 90 to 99 percent less power than conventional CNN-based solutions. As far as we know, this is one of the first neuromorphic technologies to enter the broad commercial market, and the potential applications are vast. Brainchip provides both IP versions of their technology and SoCs with full implementations in silicon. Just about any system that can take advantage of “edge” AI could benefit from those kinds of power savings, and it will often make the difference between doing edge AI and not.
Also in December 2020, Intel gave an update on their neuromorphic research test chip, called Loihi, as well as their “Intel Neuromorphic Research Community (INRC),” both of which were also announced two years ago. Across a wide range of applications including voice-command recognition, gesture recognition, image retrieval, optimization and search, and robotics, Loihi has benchmarked 30-1,000 times more energy-efficient than CPUs and GPUs, and 100 times faster. Just as importantly, the architecture lends itself to rapid and ongoing learning, in sharp contrast to CNN-based systems, which tend to have an intense training phase that creates a static model for inference. Intel says they are seeking 1,000 times improvement in energy efficiency, 100 times improvement in performance, and “orders of magnitude” gains in the amount of data needed for training.
Not all problems lend themselves to neuromorphic processing. Algorithms that are well-suited to today’s deep-learning technology are obvious wins. Intel is also evaluating algorithms “inspired by neuroscience” that emulate processes found in the brain. And, finally, they are looking at “mathematically formulated” problems.
In the first category, networks converted from today’s deep neural networks (DNNs) can be converted to a form usable by a neuromorphic chip. Additionally, “directly-trained” networks can be created with the neuromorphic processor itself. Finally, “back propagation,” common in CNNs, can be emulated in neuromorphic processors, despite the fact that this requires global communication not inherent to the neuromorphic architecture.
Loihi is a research chip, not designed for production. It is a 2-billion transistor chip, fabricated on Intel’s 14nm CMOS process. Loihi contains a fully asynchronous “neuromorphic many-core mesh that supports a wide range of sparse, hierarchical and recurrent neural network topologies with each neuron capable of communicating with thousands of other neurons.” Each of these cores includes a learning engine that adapts parameters during operation. The chip contains 130,000 neurons and 130 million synapses, divided into 128 neuromorphic cores. The chip includes a microcode learning engine for on-chip training of the SNN. Loihi chips have been integrated into boards and boxes containing as many as 100M total neurons in 768 chips.
We are now at the confluence of a number of trends that could form the perfect storm of a revolution in processor architecture. First, neuromorphic processors are at the knee of commercial viability, and they bring something like the equivalent of 10 Moore’s Law nodes (20 years) of forward progress on certain classes of problems. Second, conventional DNNs are rapidly progressing and generating related and similar architectural innovation to those found in neuromorphic processors, suggesting a possible convergence on a future “best of both worlds” architecture that combines traits from both architectural domains. Third, Moore’s Law is coming to a close, and that puts more emphasis, talent, and money into the development of architectural approaches to driving future technological progress. And fourth, power consumption has emerged as probably the dominant driving factor in computation – which is the single metric where these new architectural approaches excel most.
It will be interesting to watch as the first of these neuromorphic processors gains commercial traction and creates the virtuous cycle of investment, development, refinement, and deployment. It is likely that, within a few years, neuromorphic architectures (or similar derivative technologies) will have taken on a substantial role in our computing infrastructure and catapulted to the forefront new applications that can only be imagined today.