AI-Based Cybersecurity for Next-Generation Data Center Servers

Sometimes I envy the creators of early, unconnected computers. They could happily spend their days working on the fun stuff—both hardware and software—without worrying about bad actors, nefarious fellows, and naughty nation states trying to break in and steal their data or, worse, encrypting said data and then ransoming it.

These days, I pity the folks in charge of data centers involving tens of thousands of servers connected to each other and to the outside world. Take a moment to think about your own data. If ransomware somehow infiltrated its way onto my system and encrypted all my data, I’d be dead in the water, as it were. I can’t even imagine what it must be like to be an IT manager in charge of an entire company’s data. I don’t know how they sleep at night.

All of this leads us to something called “platform management” (also known as “hardware management”), which refers to overseeing and supervising all aspects of the hardware portion of an electronic system, including motherboards and add-in cards. In the case of the server motherboards employed in data centers, platform management embraces everything from low-end tasks such as power-sequencing the devices on the board when the system is powered-on or powered-off, all the way to high-end activities such as resource discovery, telemetry, configuration, control, update, security, and resiliency.

For the purposes of these discussions, let’s focus on the three main platform management functions, which are power sequencing, security and resiliency, and board management.

Power sequencing involves ensuring that the various power supplies have stabilized before activating the other devices on the board in the correct order when the system is powered-on, and then deactivating them in the correct order when the system is powered-off. Traditionally, power sequencing has been performed using a CPLD, FPGA, or microcontroller unit (MCU).

In addition to storing the keys required by any cryptographic functions, a hardware (HW) root of trust (RoT) facilitates a secure boot process, ensuring the integrity and authenticity of the system by performing tasks like ensuring that any firmware images come from known and trusted sources. The HW RoT has evolved to provide Platform Firmware Resilience (PFR), which monitors and filters malicious traffic on the system buses and verifies the integrity of application software before code is executed. Traditionally, the HW RoT has been implemented as an ASIC, FPGA, or MCU.

Higher-level board management functions are performed by the Baseboard Management Controller (BMC), which is used to implement tasks like resource discovery, telemetry, configuration, control, and update. The BMC “touches” everything else on the board, checking the status of the peripherals and other devices, and deciding when the main processor can commence operation. The BMC features a high-power processor capable of running the Linux operating system (OS). It also includes one or more network ports to facilitate remote monitoring and control. In addition to tens of I2C interfaces, the BMC employs a JTAG host controller to monitor and control other devices on the board. Due to their highly specialized nature, BMCs have traditionally been implemented as ASICs in the form of application-specific processors or as SoC FPGAs containing one or more hard processor cores.

Until recently, platform management has been implemented using multiple discrete devices running proprietary software, where these devices are mounted directly on the main server motherboard. The problem is that cybersecurity threats are evolving at an unprecedented rate, which leads to the danger of server motherboards becoming vulnerable to cyberthreats against which they have no defense.

Would you like to be the person who needs to tell the boss that the motherboards residing in tens of thousands of servers must be replaced? Me neither.

Thankfully, the folks at the Open Compute Project (OCP) have come up with a solution. The OCP is a collaborative community focused on developing hardware technology to efficiently support the growing demands on compute infrastructure. The OCP shares open-source designs of data center products and best practices among companies. In the case of platform management for servers in data centers, the OCP has proposed moving most of the functionality off the baseboard and onto a Data Center Secure Control Module (DC-SCM) module, which plugs into the server motherboard. The advantage of the DC-SCM approach is that as new BMC functionalities and devices come online, and as new security functionalities and devices are developed to support developments like Post-Quantum Cryptography (PQC), new DC-SCMs can be swapped into the system while leaving the baseboard functionality “as-is.”

So, why am I waffling about all of this here? Well, I was just chatting with Gopi Sirineni, who is the President and CEO of Axiado. We started with Gopi sharing the following graphic.

Moving platform management off the server motherboard into a DC-SCM (Source: Axiado)

On the left-hand side of this image we see a traditional server motherboard. In the middle of the image we see one of Axiado’s DC-SCM2.0 cards. This plugs into the Host Processor Module (HPM) on the right-hand side of the image. In the context of our discussions here, we can think of the HPM as being a next-generation server motherboard. From the OPC’s perspective, an HPM is any processing module that is managed by an SCM. As they say: “In simplest terms, this is similar to today’s motherboard with BMC and Security circuitry removed. However, this is not limited to standard processor architecture and can apply to any architecture utilizing management and security features.”

Now, let’s return to Axiado’s DC-SCM2.0 card in the previous image. The guys and gals at Axiado refer to this as a Smart Secure Control Module (Smart-SCM). As they say on their website: “To overcome the limitations of existing hardware security, Axiado reimagined the OCP’s trusted platform datacenter-ready secure control module (DC-SCM) and created the Smart-SCM card, powered by the Axiado TCU.”

They go on to say: “World’s first RoT, BMC, TPM, HSM, and firewall functions integrated into a single device” and “Dedicated AI hardware for cybersecurity, providing preemptive protection against network- and peripheral-based, and physical side-channel attacks” and “Enablement of virtual platforms with independent RoT, BMC, and trust agents on a single TCU” (phew!).

The TCU of which they speak is the big silver-colored device to the right of the Smart-SCM, where TCU stands for Trusted Control/Compute Unit (hmmm, “Why not TCCU?” I ask myself). This bodacious beauty is implemented in TSMC’s 12nm technology node, presented in a 23mm x 23mm BGA package, and consumes only ~5W of power.

Say hello to Axiado’s TCU (Source: Axiado)

Again, on their website we read: “Axiado’s single-chip TCU control plane innovation is a hardware-anchored solution rooted in real-time and pre-emptive AI with pre-emptive threat detection. It provides comprehensive protection through a dedicated coprocessor to enable manufacturers to build solutions that are safe, secure, and resilient by design and default.”

As we see, the TCU includes APP processors performing the BMC functions, a Trusted Platform Module (TPM) that provides Secure Vault and HW RoT, and programmable AI engines.

It’s the artificial intelligence (AI) portion of all this that allows Axiado to stand proud in the crowd. This includes Forensic AI (ransomware attack detection), Network AI (network attack detection), Sensor AI (side-channel attack detection), and Behavioral AI (behavior anomaly detection). In addition to notifying SecOps of any perceived threats, the TCU offers protection by taking immediate, appropriate, and local action, including shutting down anything from a single network port to the entire server while also communicating its findings to other servers.

Do you remember the 1988 American action film, Die Hard, which featured Bruce Willis as New York City Police Department (NYPD) Detective John McClane? A summary of the plot is that McClane happens to be in a skyscraper to visit his estranged wife, Holly, who is attending an office party, when the tower is seized by a German radical and his heavily armed team. Everyone in the tower is taken hostage except for McClane, who wasn’t expected to be there. Suffice it to say that McClane is not well pleased to find his wife held hostage, and he makes his displeasure felt on many levels, including killing all the bad guys.

The reason I mention this is that one of the things that Gopi said really stuck in my mind. In a crunchy nutshell, Gopi noted that the standard processors and other devices on the server motherboard are like the skyscraper in the movie—the bad guys know what’s inside and think they can roam around at will, doing whatever they wish in a wanton way. Meanwhile, the AI in Axiado’s TCU is like John McClane, stealthily surveilling the bad guys, collecting data (how many attackers, their locations, their hostages) and then acting with “extreme prejudice,” as it were.

All I can say is that I’m glad the chaps and chapesses at Axiado are on our side! What say you? Do you have any thoughts you’d care to share on any of this?

AI-Based Cybersecurity for Next-Generation Data Center Servers

Related

Leave a Reply Cancel reply

featured paper

Want early design analysis without simulation?

featured chalk talk