Ooh, Ooh that Smell! Can Intel’s IPUs Clean up the Cloud Data Center Mess?

Data center architecture must change because the applications running in data centers have changed. Several factors are forcing these architectural changes, including some key trends:

The migration from monolithic applications running on single CPUs to distributed applications running on multiple virtual machines (VMs) using containers and microservices.
The migration from single-owner enterprise data centers to data centers owned and operated by cloud service providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud, Alibaba Cloud, IBM, Dell, HP Enterprise (HPE), etc.
The meteoric rise of hackers and other malicious actors who prod and probe public and private networks to find exploitable weaknesses for the purposes of stealing valuable, sellable data, selling that data on the Dark Web, and simply wreaking general havoc.

The death of Dennard Scaling two decades ago has been chiefly responsible for the migration from monolithic applications running on single CPUs to distributed applications. Let me explain that chain of thought before you leave a snide comment about blending hardware and software trends into an unpalatable mush.

If Dennard Scaling had not died, then we’d get faster and faster processors with each new semiconductor process node. That hasn’t happened. Maximum processor clock speeds haven’t really changed a lot in 20 years. Instead, we’ve gotten more processor cores per CPU chip. Dennard Scaling perished, but Moore’s Law continued. We’re currently up to 64 cores for the largest x86 server CPUs at the moment, and that number is sure to increase as new process nodes allow us to put more transistors on one die, and as the big guys (AMD and Intel) increasingly use multi-die packaging to build larger and larger CPU clusters in one package.

Because individual processors are not getting faster, as they once did, and because CPU cores and multithreading make concurrent processing a reality, software developers have necessarily turned to distributed computing to exploit all of those CPU cores. This situation overlays the reality of today’s cloud data centers, which house hundreds of thousands of interconnected servers, all based on multicore CPUs.

The latest techniques for creating distributed programs involve the use of a microservice architecture, or just “microservices” for short. Microservices offer many advantages, but, for software developers, perhaps the main benefit is that distributed programs that employ microservices can be implemented in smaller, more easily debugged chunks, using different programming languages, multiple databases, and different hardware and software environments, depending on what fits best. However, you don’t get the advantages of distributed programming, virtual machines, containers, and microservices for free. There’s an overhead surcharge. The overhead can consume as much as 80% of the CPU cycles, or more, according to at least one recent paper published by Facebook. That’s a lot of overhead. Too much, in fact.

At the same time, cloud data centers must run myriad unrelated programs from multiple customers. Most of these customers are running benign programs that pose no threat, but hackers posing as customers are constantly looking for ways to drill through security walls to access protected data for nefarious purposes. For example, T-Mobile announced in mid-August that hackers had stolen personal information from its customer database. A Dark Web forum post offering to sell 30 million user files, apparently from the T-Mobile data breach, says the data includes social security numbers, phone numbers, names, physical addresses, and driver’s license information. Some hack!

Intel has stepped into this fray with a whole new architectural idea: Infrastructure Processing Units (IPUs). From a hardware perspective, IPUs are going to look very familiar. They’re simply SmartNICs decked out in a more refined wardrobe, a.k.a. a new and improved mission. I suspect that’s going to take a bit of explaining, so let’s start the explanation by discussing the two SmartNICs, er IPUs, disclosed by Intel Data Platforms Group CTO Guido Appenzeller during Intel’s most recent Architecture Day, held on August 19. (For more information about Intel’s Architecture Day, see Max Maxfield’s EEJournal article “Will Intel’s New Architectural Advances Define the Next Decade of Computing?”)

First, some background on today’s data centers. During Architecture Day, Appenzeller compared the “old” enterprise data center architecture with today’s cloud-centric data center architecture. He said the old data center architecture resembles his house, with all of the different dedicated areas like the kitchen, dining room, and living room easily accessible from the other areas in the house. This is analogous to all tasks – user tasks, infrastructure tasks, and overhead – running on the same CPUs. This architecture came into existence to serve the needs of one tenant: the enterprise that owned the data center.

Cloud data center architecture more closely resembles a hotel, because there are multiple tenants (the guests) and a cloud service provider that owns the data center (the hotel). The hotel’s guest rooms, dining room, kitchen, and other service areas are walled off from each other and often require security keys to go from one area to another. It’s more expensive to build a structure using the hotel model. There’s overhead in terms of extra walls, more doors, and security devices such as badge readers.

The extra expense is incurred for a reason: you don’t want hotel guests in the kitchen and you don’t want guests or service personnel in guest rooms without the right permissions. Malicious actors could cause all sorts of problems if they had free access to all areas in a hotel. At the same time, certain costs are reduced. For example, it’s less expensive to have one central kitchen than to equip hundreds or thousands of hotel guest rooms with their own individual kitchens.

The old data center architecture resembles single-family housing, and new, cloud-centric data center architectures resemble hotels, says Intel Data Platforms Group CTO Guido Appenzeller. Image Credit: Intel.

Intel‘s IPUs are designed to break the implicit link between user application programs running on server CPUs (tenant tasks) and infrastructure tasks, which will now run on IPUs. The IPUs are simply not in user space. In theory, no tenant programs running on data center CPUs can have access to the IPU programming environment, which belongs exclusively to the data center operator.

Thus IPUs provide three major benefits, according to Appenzeller’s Architecture Day presentation:

IPUs separate infrastructure functions from tenant workloads, which provides much better isolation between these functions and greatly enhances system security.
Tenant applications take full control of and get the full performance from the server CPU because the IPU has offloaded “all” of the infrastructure overhead. (Note: I don’t believe that it’s possible to eliminate 100 percent of this overhead, so I guess we’ll need to wait and see.)
IPUs enable a diskless server architecture, which means that all data center servers can now use a centrally managed storage subsystem. This reorganization would greatly reduce the tendency to overprovision storage on individual servers to accommodate the unknown needs of tenant programs. In turn, the reduction of storage overprovisioning will reduce data center capital expenditures. This situation resembles the individual kitchens for hotel guest rooms analogy.

Now let’s look at the two new IPUs Appenzeller discussed during the recent Architecture Day. The first IPU is called “Oak Springs Canyon.” It’s the successor to “Big Springs Canyon,” which is based on an Intel Xeon-D CPU and an Intel Stratix 10 DX FPGA. When it was announced just a year ago, Big Springs Canyon was called a SmartNIC. Look on the Intel Web site today and it’s now called an IPU, as is Intel’s “Oak Springs Canyon” follow-on board, which keeps the Xeon CPU and replaces the Stratix 10 FPGA with an Intel Agilex FPGA.

The Intel Oak Springs Canyon IPU is based on an Intel Xeon-D CPU and an Intel Agilex FPGA. Image Credit: Intel.

Intel’s second announced IPU proves that IPUs need not be based on FPGAs and suggests that IPUs need not even be board-level products. Intel calls the Mount Evans ASIC an “IPU on a chip,” developed in partnership with an unnamed “top cloud provider.” The Mount Evans ASIC combines an Arm Neoverse N1 multiprocessor core (yes, that’s right: an Arm CPU architecture) with various network function blocks including a packet-processing pipeline, cryptographic and compression engines, network traffic shapers, and an NVMe controller. (If you’re shocked that Intel would develop an Arm-based ASIC, just consider that all of Intel’s SoC FPGAs are similarly equipped with Arm processor cores.)

Intel’s Mount Evans IPU ASIC combines an Arm Neoverse N1 multiprocessor core with various Ethernet special function blocks including a packet-processing pipeline, cryptographic and compression engines, network traffic shapers, and an NVMe controller. Image Credit: Intel.

So what differentiates a SmartNIC from an IPU? Don’t know. Perhaps it’s the combination of CPU software processing with programmable hardware and hardened function units on one chip or board. However, given Intel’s reclassification of some SmartNICs as IPUs, the hardware represents only a portion of the story. Perhaps the dedication of IPUs to infrastructure processing as opposed to the more general “you program it to do what you want” nature of SmartNICs is the true differentiator.

Companion software and a robust ecosystem will be even more important than the hardware if IPUs are to succeed in winning cloud service providers. For now, Intel talks about building on its existing IPU software foundation and ecosystem vendors, but the whole Intel IPU concept is so new that there’s not much of a foundation yet and there are still a lot of older SmartNIC concepts mixed in with the new IPU ideas. No doubt, the waters will clear as Intel refines and clarifies this transition from SmartNIC accelerators to IPUs. We’ll be sitting on the edge of this pond to watch as the waters clear and to see if data center architects adopt IPUs more readily than they have adopted SmartNIC acceleration in the past.

For more details on Intel’s current IPU thinking, see “The IPU: A New, Strategic Resource for Cloud Service Providers.”