feature article
Subscribe Now

Performing Extreme AI Analog Compute Sans Semiconductors

Generally speaking, I tend to duck when someone lobs a business-related press release in my direction. Call me a techno-weenie if you will, but I really don’t care to learn that Company A has been pondering the possibility of potentially signing a letter of intent with Company B to establish a framework within which to commence discussions about the prospect of collaborating on some vaguely defined project at some undefined time in the future.

I would say that I have an unwritten rule (the best kind, in my experience) about this sort of thing. On the other hand, “rules are meant to be broken,” as they say, although I personally prefer the adage, rules are for the obedience of fools and the guidance of the wise.”

To cut a long story short (which is opposite to the way my dear old mother taught me to do things), I didn’t run for the hills when my old friend Kalyn Kolek emailed me to say, “I’m now on the venture capital (VC) side of things at Anzu Partners. I’m working with a new startup called EnCharge AI that my firm has invested in and that I think will be of interest to you.”

Kalyn then proceeded to tempt me with three teasing bullet points as follows:

  • EnCharge’s technology was developed at Princeton University and is led by a world-class, multi-disciplinary founding team from Meta, NVIDIA, Qualcomm, and IBM.
  • This charge-based in-memory computing technology has demonstrated AI performance with orders-of-magnitude higher compute efficiency and density than has been achieved by best-in-class digital accelerators, such as GPUs or TPUs, resulting in over 150 TOPS/W for 8-bit compute.
  • EnCharge will deliver platforms that provide over 20x higher performance per watt and over 14x higher performance per dollar, compared to the best-in-class digital AI accelerators.

I couldn’t help myself. I was lost when I saw “150 TOPS/W.” I simply had to look at the press release, EnCharge AI Launches With $21.7M Series A to Enable Edge AI at Scale, which—in addition to the information summarized above—informed me that “EnCharge offers a software stack that supports broad AI models and resolutions while integrating seamlessly into user frameworks and design flows” and “EnCharge seeks to unlock the immense potential of AI by making it accessible to power-, energy-, and space-constrained applications at the edge. These capabilities are valuable for market applications such as automotive sensing, advanced manufacturing, smart retail, smart warehouses and logistics, industrial robotics, and drones.”

I also bounced over to the EnCharge website to learn more about the company and its founders, Naveen Verma, PhD (CEO), Kailash Gopalakrishnan, PhD (CPO), and Echere Iroaga, PhD (COO). “Among them, these three guys boast a cornucopia of business and technology know-how sufficient to make a success of anything they turn their hands to.” (said Max, BSc)

Kalyn had succeeded. I wanted to learn more, so we set up a video chat between your humble narrator (I pride myself on my humility), Naveen, and Echere, while poor old Kailash was left to run the company in their absence.

We got off to a great start:

Me: “Hmmm. Your company is called EnCharge and I’m informed you have developed a charge-based in-memory computing technology. Can I assume this is based on a floating-gate memory technology like Flash?”

Naveen: “No”

This wasn’t as bad as it sounds because Naveen smiled as he imparted this nugget of knowledge. Before we proceed, I should perhaps note that I’ve been introduced to several analog-based artificial intelligence (AI) and machine learning (ML) accelerators based on floating-gate memory technologies over the past year or so.

A racing summary of the concept behind those technologies in a crunchy nutshell is as follows. AI and ML are based on artificial neural networks (ANNs). These involve layer upon layer of artificial neurons. They also involve eye-watering amounts of matrix manipulations and computations. In fact, 95% to 99% of an AI/ML application is devoted to these matrix operations. At the core of these matrix operations are multiply-accumulates, where the results of a bunch of multiplications are accumulated.

We can do all this digitally using a regular microcontroller unit (MCU), but this is time-consuming and power-gobbling to say the least. Alternatively, we can do things in a massively parallel fashion using FPGAs, or we can employ a hybrid approach using arrays of small processing cores in the form of graphics processing units (GPUs). In fact, FPGA and GPU-based solutions largely dominate this space in data centers. However, as we move closer to the edge (where the internet meets the real world), size, cost, and power consumption considerations become increasingly important.

I always think of floating gates in the context of simple Flash memory devices, where each cell (gate) can essentially hold a 0 or a 1 value. However, some companies have developed the ability to control the number of electrons in the floating gate to represent 256 different values (equivalent to 8 bits of storage) (see also Meet Mythic AI’s Soon-to-be-Legendary Analog AI). If these values are used to represent a weight (coefficient) in an ANN, then each Flash cell can be used to perform an analog multiplication (voltage x conductance) between its activation voltage and its coefficient. The currents from all of the Flash cells in a column are additive (an analog addition), which means we can think of each Flash cell as acting like a synapse and each column of cells as representing a neuron.

On the one hand, analog computation is inherently less precise than its digital counterpart. On the other hand, AI and ML are inherently probabilistic in nature. What this means is that it really doesn’t matter if an AI reports a 96% probability it’s looking at a chicken or a 97% probability it’s looking at a chicken… Either way, we are pretty darned sure that we have a chicken-centric scenario on our hands.

NOW SIT UP AND PAY ATTENTION! This is the clever bit. Naveen noted that “things get messy” when you use semiconductor devices to perform your analog compute. This is because semiconductor-based analog elements are sensitive to environmental conditions like temperature fluctuations and process variations in the form of material parameters. As a result, Flash-based analog-centric technologies require constant monitoring, tweaking, and re-calibration. We can think of this as adding lots of analog-to-digital-to-analog micro-loops. Since much of this is performed using digital techniques, it sort of detracts from the advantages of using analog in the first place.

Naveen Verma, Co-Founder and CEO of EnCharge AI, holds a prototype of the company's advanced computing technology in his laboratory at Princeton University.

Naveen Verma, Co-Founder and CEO of EnCharge AI, holds a prototype of the company’s advanced computing technology in his laboratory at Princeton University.

The solution developed by EnCharge AI is to steer away from employing the chip’s semiconductor substrate to implement the analog elements used to perform the matrix multiply-accumulate operations, and instead create incredibly precise capacitors out of metal wires, which are essentially free when creating a chip in CMOS technology. These capacitors are not sensitive to temperature or semiconductor process variations; they are sensitive only to structure and geometry, which can be well-controlled in CMOS.

Imagine a column of these capacitor cells. Remember that Q = CV, which means the charge Q on each cell (wire capacitor) is the product (multiplication) of its capacitance C and applied voltage V. The total charge of the column is additive, which gives us our accumulation. Once again, we can think of each cell as acting like a synapse and each column of cells as representing a neuron.

Let’s see how well I can explain what happens next. (Don’t think of this as me being too lazy to draw a diagram. Rather, think of it as an exercise in visualization to build your mental muscles. It’s just one more service I offer. You’re welcome.) We start with an array of these cells that’s thousands of rows and hundreds of columns in size. This is called a compute-in-memory array (CIMA). This array has associated digital computation and dataflow buffers, a programmable digital SIMD engine, and some control logic, all of which are collectively known as a compute-in-memory unit (CIMU).

The reason everything needs to be programmable is that different AI/ML models have different requirements. Some layers may be performing 3×3 convolutions, some may need to implement 9×9 convolutions, others may wish to realize dilated convolutions, and so forth.

Now imagine an array of these CIMUs all connected by an on-chip network (OCN). This super-array is what’s known as a scalable in-memory computing (IMC) neural-network (NN) accelerator. I think of this as an IMCNNA, but I don’t think this terminology is employed by the guys and gals at EnCharge.

We haven’t finished yet. What we are talking about is a CMOS chip that, in addition to the IMCNNA, also includes a cluster of standard Arm processor cores that are used to perform on-chip pre- and post-processing housekeeping tasks, a PCIe Gen 5 interface to communicate with a higher-level host processing system, and a bunch of other stuff.

One huge aspect of all this is the fact that EnCharge technology is based on standard CMOS manufacturing processes, which means it can be implemented at the latest and greatest process nodes (they started at 65nm, migrated to 28nm, and are currently at 16nm).

Do you remember earlier when we said, “EnCharge offers a software stack that supports broad AI models and resolutions while integrating seamlessly into user frameworks and design flows.” What this means is that users can take AI/ML models created using standard design frameworks (PyTorch, TensorFlow, ONNX…) running on conventional FPGA/GPU-based systems, and quickly and easily start accelerating these models using EnCharge’s IMCNNA (scalable in-memory computing neural-network accelerator… I know… this doesn’t roll off the tongue… we need to come up with a better name).

In summary, the chaps and chapesses at EnCharge are pursuing an edge-to-cloud strategy. Their target is outside the data centers, close to the edge, but not at the extreme edge. They are aiming at big automation tasks that need to run state-of-the art AI/ML models, including industrial, smart manufacturing, smart retail, warehouse logistics, robotics, etc.

The folks at EnCharge say that their charge-based in-memory computing technology has demonstrated AI performance with orders-of-magnitude higher compute efficiency and density than has been achieved by best-in-class digital accelerators, such as GPUs or TPUs, and that their current proven-in-silicon solution achieves 150 TOPS/watt.

I for one am tremendously excited. Once again, this is a technology implementation I didn’t see coming and I can’t wait to see where it ends up going. How about you? Do you have any thoughts you’d care to share?

4 thoughts on “Performing Extreme AI Analog Compute Sans Semiconductors”

  1. All extremely exciting! Mythic’s work also seems very exciting. I can think of the perfect application for this technology. I wonder if the folks at EnCharge or Mythic would like to support an educational nonprofit 🙂
    theboom.org

      1. That’s sad, but it’s the fate of many small companies — I hope they do get more funding and become a success — I thought their technology was really interesting.

Leave a Reply

featured blogs
May 8, 2024
Learn how artificial intelligence of things (AIoT) applications at the edge rely on TSMC's N12e manufacturing processes and specialized semiconductor IP.The post How Synopsys IP and TSMC’s N12e Process are Driving AIoT appeared first on Chip Design....
May 2, 2024
I'm envisioning what one of these pieces would look like on the wall of my office. It would look awesome!...

featured video

Why Wiwynn Energy-Optimized Data Center IT Solutions Use Cadence Optimality Explorer

Sponsored by Cadence Design Systems

In the AI era, as the signal-data rate increases, the signal integrity challenges in server designs also increase. Wiwynn provides hyperscale data centers with innovative cloud IT infrastructure, bringing the best total cost of ownership (TCO), energy, and energy-itemized IT solutions from the cloud to the edge.

Learn more about how Wiwynn is developing a new methodology for PCB designs with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver.

featured paper

Altera® FPGAs and SoCs with FPGA AI Suite and OpenVINO™ Toolkit Drive Embedded/Edge AI/Machine Learning Applications

Sponsored by Intel

Describes the emerging use cases of FPGA-based AI inference in edge and custom AI applications, and software and hardware solutions for edge FPGA AI.

Click here to read more

featured chalk talk

Maximizing High Power Density and Efficiency in EV-Charging Applications
Sponsored by Mouser Electronics and Infineon
In this episode of Chalk Talk, Amelia Dalton and Daniel Dalpiaz from Infineon talk about trends in the greater electrical vehicle charging landscape, typical block diagram components, and tradeoffs between discrete devices versus power modules. They also discuss choices between IGBT’s and Silicon Carbide, the advantages of advanced packaging techniques in both power discrete and power module solutions, and how reliability is increasingly important due to demands for more charging cycles per day.
Dec 18, 2023
20,833 views