feature article
Subscribe Now

Neuromorphic Revolution

Will Neuromorphic Architectures Replace Moore’s Law?

According to tech folklore, Carver Mead actually coined the term “Moore’s Law” – some ten years or so after the publication of Gordon Moore’s landmark 1965 Electronics Magazine article “Cramming More Components Onto Integrated Circuits.” For the next five and a half decades, the world was reshaped by the self-fulfilling prophecy outlined in that article. Namely, that every two years or so, semiconductor companies would be able to double the number of transistors that could be fabricated on a single semiconductor chip. 

That biennial doubling of transistors led most noticeably to an even faster exponential increase in computation power. Besides getting more transistors from Moore’s Law, we got faster, cheaper, and more power-efficient ones. All of those factors together enabled us to build faster, more complex, higher-performance computing devices. In 1974, Robert Dennard observed that power efficiency in computing would increase even faster than transistor count, because of the triple-exponential improvement in density, speed, and energy efficiency as process geometry scaled downward. This trend, known as “Dennard Scaling” stayed with us for around three decades, and compute performance (and more importantly, power, it turns out) rode an unprecedented exponential improvement rocket.

All of this compute power improvement was built on top of the Von Neumann processor architecture developed in 1945 by John Von Neumann (and others), documented in an unfinished report called “First Draft of a Report on EDIVAC.” So, ironically, the most impressive technological revolution in history was built on a half-century-old design from an unfinished paper. With all the remarkable advances in digital computation during the Moore’s Law era, the now 75-year old fundamental compute architecture has remained largely unchanged.

Is the Von Neumann architecture simply the best possible way to do computation? Of course not. To mis-paraphrase Winston Churchill, Von Neumann is “the worst possible compute architecture – except for all the others.” On a more serious note, the good thing about Von Neumann is its flexibility and area efficiency. It can handle just about any arbitrarily complex application without requiring the processor to scale in transistor count with the size of the problem. 

In the old days, before we could cram so many components onto integrated circuits, that architectural efficiency of Von Neumann was a big deal. We could build a 4-, 8-, or 16-bit Von Neumann processor out of a very small number of transistors and run giant programs at acceptable speed. But now, in the wake of Moore’s Law, transistors are asymptotically approaching zero cost. So, with almost infinite numbers of free transistors available, the value of building processors with comparatively small numbers of transistors has dropped significantly.

At the same time, even with Moore’s Law going full steam, the value extracted from each subsequent process node has decreased. Dennard Scaling came to an end around 2005, forcing us to switch from building bigger/faster Von Neumann processors to building “more” Von Neumann processors. The race became cramming more cores onto integrated circuits, and the scalability of Von Neumann to multi-core brought its own limitations.

Unfortunately, Moore’s Law didn’t continue going full steam. Each of the last several process nodes has cost exponentially more to realize and has yielded proportionally less in tangible benefits. The result is that, even though we should technically be able to build more dense chips for several more generations, the cost/benefit ratio of doing so makes it a less and less attractive enterprise. We now need a driver other than Moore’s Law to maintain the pace of technological progress.

Clearly, we are also reaching the end of the useful life of Von Neumann as the single, do-all compute architecture. The recent AI revolution has accelerated the development of alternatives to Von Neumann. AI, particularly done with convolutional neural networks, is an unbelievably compute-intensive problem that is uniquely unsuited to Von Neumann. Already, an industry-wide trend is afoot, shifting us away from large arrays of homogenous compute elements to complex configurations of heterogeneous elements including Von Neumann and non-Von Neumann approaches. 

One of the more promising non-Von Neumann approaches to AI is the Neuromorphic architecture. In the late 1980s, Carver Mead (yup, the same guy who supposedly coined the term “Moore’s Law”) observed that, on the current trajectory, Von Neumann processors would use millions of times more energy than the human brain uses for the same computation. He theorized that more efficient computational circuits could be built by emulating the neuron structure of the human brain. Mead made an analogy of neuron ion-flow with transistor current, and proposed what came to be known as Neuromorphic computing based on that idea.

At the time, Neuromorphic computing was visualized as an analog affair, with neurons triggering one another with continuously varying voltages or currents. But the world was firm on the path of optimizing the binary universe of digital design. Analog circuits were not scaling at anything like the digital exponential, so the evolution of Neuromorphic computing was outside the mainstream trajectory of Moore’s Law. 

Now, however, things have changed. 

In the longer term, we have seen most analog functions subsumed by digital approximations, and neuromorphic processors have been implemented with what are called “spiking neural networks” (SNNs), which rely on single-bit spikes from each neuron to activate neurons down the chain. These networks are completely asynchronous, and, rather than sending values, the activation depends on the timing of the spikes. Using this technique, neuromorphic processors have been implemented, taking advantage of current leading-edge bulk CMOS digital technology. This means neuromorphic architectures can finally reap the rewards of Moore’s Law. As a result, several practical neuromorphic processors have been built and tested, and the results are impressive and encouraging.

One example we wrote about two years ago is Brainchip’s Akida neuromorphic processor, for which development boards became available in December, 2020. Brainchip claims their devices use 90 to 99 percent less power than conventional CNN-based solutions. As far as we know, this is one of the first neuromorphic technologies to enter the broad commercial market, and the potential applications are vast. Brainchip provides both IP versions of their technology and SoCs with full implementations in silicon. Just about any system that can take advantage of “edge” AI could benefit from those kinds of power savings, and it will often make the difference between doing edge AI and not.

Also in December 2020, Intel gave an update on their neuromorphic research test chip, called Loihi, as well as their “Intel Neuromorphic Research Community (INRC),” both of which were also announced two years ago. Across a wide range of applications including voice-command recognition, gesture recognition, image retrieval, optimization and search, and robotics, Loihi has benchmarked 30-1,000 times more energy-efficient than CPUs and GPUs, and 100 times faster. Just as importantly, the architecture lends itself to rapid and ongoing learning, in sharp contrast to CNN-based systems, which tend to have an intense training phase that creates a static model for inference. Intel says they are seeking 1,000 times improvement in energy efficiency, 100 times improvement in performance, and “orders of magnitude” gains in the amount of data needed for training. 

Not all problems lend themselves to neuromorphic processing. Algorithms that are well-suited to today’s deep-learning technology are obvious wins. Intel is also evaluating algorithms “inspired by neuroscience” that emulate processes found in the brain. And, finally, they are looking at “mathematically formulated” problems.

In the first category, networks converted from today’s deep neural networks (DNNs) can be converted to a form usable by a neuromorphic chip. Additionally, “directly-trained” networks can be created with the neuromorphic processor itself. Finally, “back propagation,” common in CNNs, can be emulated in neuromorphic processors, despite the fact that this requires global communication not inherent to the neuromorphic architecture.

Loihi is a research chip, not designed for production. It is a 2-billion transistor chip, fabricated on Intel’s 14nm CMOS process. Loihi contains a fully asynchronous “neuromorphic many-core mesh that supports a wide range of sparse, hierarchical and recurrent neural network topologies with each neuron capable of communicating with thousands of other neurons.” Each of these cores includes a learning engine that adapts parameters during operation. The chip contains 130,000 neurons and 130 million synapses, divided into 128 neuromorphic cores. The chip includes a microcode learning engine for on-chip training of the SNN. Loihi chips have been integrated into boards and boxes containing as many as 100M total neurons in 768 chips. 

We are now at the confluence of a number of trends that could form the perfect storm of a revolution in processor architecture. First, neuromorphic processors are at the knee of commercial viability, and they bring something like the equivalent of 10 Moore’s Law nodes (20 years) of forward progress on certain classes of problems. Second, conventional DNNs are rapidly progressing and generating related and similar architectural innovation to those found in neuromorphic processors, suggesting a possible convergence on a future “best of both worlds” architecture that combines traits from both architectural domains. Third, Moore’s Law is coming to a close, and that puts more emphasis, talent, and money into the development of architectural approaches to driving future technological progress. And fourth, power consumption has emerged as probably the dominant driving factor in computation – which is the single metric where these new architectural approaches excel most. 

It will be interesting to watch as the first of these neuromorphic processors gains commercial traction and creates the virtuous cycle of investment, development, refinement, and deployment. It is likely that, within a few years, neuromorphic architectures (or similar derivative technologies) will have taken on a substantial role in our computing infrastructure and catapulted to the forefront new applications that can only be imagined today.

One thought on “Neuromorphic Revolution”

  1. No More “Times Lower Than”
    Hey Marketers and PR folks – this one’s for you. In researching this article I repeatedly ran into official, presumably copy-edited materials that made claims that power was “75X lower” or “30 times less than” other solutions. This is absolutely wrong.

    Your audience for these materials is engineers. Engineers are very good at math. When you use incorrect terminology like this, you damage your credibility.

    There are two distinct mistakes we see repeatedly in this area. I’ll explain both:

    First, is the “less than” or “lower than” mistake which is most egregious. Let’s agree that “1 times” means “100%.” If I say that we were able to “reduce power by 50%”, everyone would agree that means we now use half the power we did before. If I said we reduced power by 75%, we are using a quarter of the power, and so on.
    So a 1x reduction in power, or making power “1x lower” would mean we have reduced it by 100%. We now use zero power, which would be the best we could possibly do, unless our device now actually creates power – at which point the grammar police would turn the case over to the thermodynamics police.
    In Intel’s materials (sorry for picking on Intel, they are definitely not the only offenders), they say “…its Loihi solutions … require 75 times lower power than conventional mobile GPU implementations.”

    They probably meant to say “1/75th the power of mobile GPU implementations” 1/75th the power would equal 1.33% of the power, or 98.66% lower power. Obviously 98.66% is almost (but not quite) 1x. So, if they really want to use “times lower power” it should have read “require almost 1 times lower power than conventional…” If they truly used 75 times lower power, it would mean they are in the energy generation business, and one Loihi actually creates 74 times the power that a GPU burns. Hmmm…

    Notice I said “74 times” and not “75 times”? That brings us to the second common mistake.

    It is valid, when we are talking about things increasing, to use “times”. But there is an important difference between “times as much” and “times more”.

    Let’s start with the “one” example again. One is clearly “one times as much” as one. But, “one times more” than one would be two. Two times more than one would be three, and so on.

    You see where this is going?

    If your old system ran at 100 MHz, and your press release says your new system has “3x more performance,” do you mean it now runs at 400 MHz (correct), or are you saying it runs at 300 MHz (nope). Saying “three times the performance” would clearly mean 300 MHz, but when you switch the language to “more” you make it incorrect, or at best ambiguous.

    Thus ends my rant. Marketers, please never again use “times more” or “times lower” for any figures of merit in your materials. Train your copy editors to flag it if you do. Send me a snarky note if you find me using one (I’m certain I have). If you must use “times” do it only when the figure of merit is increasing. And, if you do, avoid “more than.” Say “three times the” or similar.

Leave a Reply

featured blogs
May 19, 2022
The current challenge in custom/mixed-signal design is to have a fast and silicon-accurate methodology. In this blog series, we are exploring the Custom IC Design Flow and Methodology stages. This... ...
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Synopsys PPA(V) Voltage Optimization

Sponsored by Synopsys

Performance-per-watt has emerged as one of the highest priorities in design quality, leading to a shift in technology focus and design power optimization methodologies. Variable operating voltage possess high potential in optimizing performance-per-watt results but requires a signoff accurate and efficient methodology to explore. Synopsys Fusion Design Platform™, uniquely built on a singular RTL-to-GDSII data model, delivers a full-flow voltage optimization and closure methodology to achieve the best performance-per-watt results for the most demanding semiconductor segments.

Learn More

featured paper

Reduce EV cost and improve drive range by integrating powertrain systems

Sponsored by Texas Instruments

When you can create automotive applications that do more with fewer parts, you’ll reduce both weight and cost and improve reliability. That’s the idea behind integrating electric vehicle (EV) and hybrid electric vehicle (HEV) designs.

Click to read more

featured chalk talk

Power Profiler II

Sponsored by Mouser Electronics and Nordic Semiconductor

If you are working on a low-power IoT design, you are going to face power issues that can get quite complicated. Addressing these issues earlier in your design process can save you a lot of time, effort, and frustration. In this episode of Chalk Talk, Amelia Dalton chats with Kristian Sæther from Nordic Semiconductor about the details of the new Nordic Power Profiler Kit II - including how it can measure actual current, help you configure the right design settings, and show you a visualized power profile for your next design.

Click here for more information about the Nordic Semiconductor Power Profiler Kit II (PPK2)