feature article
Subscribe Now

Birth of the AI Machine

Digital Design is New Again

The dawn of artificial intelligence has ironically coincided with the dusk of Moore’s Law. Just as our collective engineering genius begins to wane on the task of exponential improvements in the performance and efficiency of von Neumann processors, those processors reach a point where they can do a rudimentary job on artificial intelligence applications. Apparently something in the cosmos either has a sinister sense of irony, or the universe is warning us not to cause the rise of the machines.

Chip designers, however, welcome our new silicon overlords.

Let’s back up a bit. It’s been years since we hit the wall on processor-frequency scaling and went to multi-core architectures to get our speed. Now, with silicon process technology slowing down, power consumption (and therefore thermal issues) reaching a knee, and software compilation struggling to take advantage of ever-increasing core count, our progress in processor performance has slowed considerably. At the same time, AI technology – and convolutional neural networks in particular – burst onto the scene, demanding more than our processors can deliver and offering enormous rewards in terms of applications that were simply not possible with conventional software approaches. 

The energy of the industry has been thrown behind solving the AI challenge, and the contenders and progress are almost too numerous and too fast to track. At this point in time, there is wide belief that accelerating feed-forward convolutional neural networks (CNNs) is a high-value bet. There are other approaches to AI, of course (and we’ll go into detail on some of those in a few weeks), but the bulk of the industry’s effort right now is around novel hardware architectures for training and inferencing on CNNs. 

This gives rise to an opportunity for an entirely new category of compute machine. And, unfortunately, nobody yet knows what it should look like. It appears we actually need two different things – machines that can do rapid training of CNNs, and machines that can do blazingly-fast, energy-efficient, low-latency inferencing of those trained machines. Logic designers, let the De Morgans commence! 

CNN training is generally done once, on enormous data sets, and is most often accomplished in a data center environment where compute resources are plentiful and performance metrics such as latency and throughput are not critical. Sure, faster training is a good thing, but there are no life-or-death situations hanging in the balance at training time. Today, the go-to solution seems to be massive clusters of GPUs crunching floating-point data at breakneck speed. Damn the power consumption. Full speed ahead!

Inferencing uses that trained machine in the field to do real work. If you trained your system to recognize pedestrians stepping into the street, your inferencing system must be able to perform that task with a high degree of accuracy, in real time, with minimal latency, and probably with minimal power consumption. Nobody wants a system to tell them that the object they struck 200ms ago was actually a former pedestrian. Inferencing is currently done via a wide variety of approaches and hardware architectures, and it is clearly where the most market payoff will occur.

On the training side, we land squarely in the new battle for data center dominance. Sitting right in the center of that melee is Intel, who has long held a commanding share of the data-center hardware market, driven by their Xeon processor line. Intel is clearly taking a system-level approach to retaining that dominance, and leaving no flank uncovered, by attacking with a range of technologies. On the processing front, Intel is defending their perch with AI-enhanced versions of Xeon processors, to FPGAs, to specialized chips for inferencing – both in the data center and in the cloud. But Intel is also fortifying their offerings in memory, connectivity, storage, software tools, and (perhaps most strategically) something they call “Select Solutions,” which bundles lots of Intel stuff together into a proven, working combination that OEMs can easily slap a label on and sell. This creates a formidable barrier to entry for competitors trying to sell a single-chip architecture or technology into the data center. Nobody wants to buy a Volvo, but then stick a Toyota engine in it. The weaker, less dominant technologies in Intel’s portfolio can cruise into the data center on the coattails of the Xeon.

Challenging Intel’s data-center position are the usual suitors like AMD, of course, but, in the AI domain in particular, NVidia has carved out a very strong business in the AI/neural-network-training space with their GPUs (and their programming framework). Xilinx, the long-time dominant player in the FPGA market, has declared themselves to be “data center first” and is posting some impressive results in winning data center business, including key wins in “super 7.” Xilinx also is touting an ambitious strategy to produce extremely powerful and efficient accelerator chips in an attempt both to cut into NVidia’s niche and to continue their longstanding FPGA feud with Intel PSG (formerly archrival Altera).

Look for Intel to continue bundling their solutions, in order to block socket wins for the likes of NVidia and Xilinx, and to bet on a wide array of technologies and approaches to inoculate themselves from other potentially disruptive technologies from new challengers.

Moving to the more interesting inferencing side of the AI equation, it pays to take a look at the actual computational challenges of neural networks – and they are numerous. CNNs are graphs arranged in layers – often hundreds or thousands of layers deep. The layers consist of nodes (neurons) interconnected by weighted edges (synapses). During training, the weights or coefficients are computed. During inference, the resulting graph with weighted nodes is used to process and classify new inputs. Computationally, inferencing stresses every aspect of system resources, and the relative requirements for memory, arithmetic, and connectivity vary with each network’s unique topology. What CNNs always are, however, is massively parallel. That means building a “standard” processor for CNNs is a difficult challenge. 

The inference problem can be further divided, between applications where latency is not critical and inferencing can be offloaded to the cloud, and those where latency is critical and where inferencing must be done at the edge, largely with power-, cost-, and space-constrained hardware. For the data-center side, GPUs and CPUs still compete for the bulk of the business. NVidia has been stepping up their game in inferencing performance of late, and Intel just announced AI-specific extensions to the Xeon called “Intel DL Boost” that the company claims delivers an ~11x improvement in inferencing performance via special instructions. Other vendors are delivering GPU or FPGA acceleration solutions that run in PCIe slots.

Heavily configurable architectures, such as FPGAs, bring compelling value to the table with their massive parallel connectivity, enormous amounts of optimized arithmetic (particularly DSP-oriented multiply-accumulate (MAC) units), and large amounts of memory located very close to computing resources. Further reinforcing the custom/configurable hardware idea, calculations can often be done at very low integer precision during inferencing. If the network is quantized to find the least precision that will deliver accurate results, massive amounts of hardware and power can be saved, and performance can be dramatically increased. Here again, FPGAs shine.

Of course, at the edge, even FPGAs may be too slow, too expensive, and too power hungry. In those situations, there are a number of purpose-built neural network accelerator devices, or a trained network can be implemented as an ASIC or an ASSP. This delivers the highest performance, lowest power, and lowest unit cost, at the expense of flexibility and development cost. In this situation, Intel’s recent acquisition of eASIC comes into focus, as it facilitates the creation of an ASIC solution directly from a working FPGA design.  

It is clear that the surface has barely been scratched in developing hardware platforms to support artificial intelligence. AI is such a novel and demanding workload, with such astronomical potential in terms of applications and economics, that it should drive a massive wave of engineering innovation. Today, we are in the most nascent stages of that wave. It will be interesting to watch.

3 thoughts on “Birth of the AI Machine”

    1. How did THAT get through?!? And not just once but TWICE!
      Thanks for the note, Tom. The article has been corrected and we are re-training our instance with a bit less quantization. – Kevin

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

The Future of Intelligent Devices is Here
Sponsored by Alif Semiconductor
In this episode of Chalk Talk, Amelia Dalton and Henrik Flodell from Alif Semiconductor explore the what, where, and how of Alif’s Ensemble 32-bit microcontrollers and fusion processors. They examine the autonomous intelligent power management, high on-chip integration and isolated security subsystem aspects of these 32-bit microcontrollers and fusion processors, the role that scalability plays in this processor family, and how you can utilize them for your next embedded design.
Aug 9, 2023
30,590 views