feature article
Subscribe Now

A Bigger Packet Pipe

Cavium Announces OCTEON III

Multicore is familiar territory in the communications world. That application area is arguably where the most sophisticated multicore practitioners operate. Unlike other embedded areas, like cellphones – if, that is, you consider them embedded, and which are just starting to use more than one core in a real multicore way, communications infrastructure designers have been using several cores for a long time.

The reason is simple: speed. You’ve got packets flying at you at a bazillion gigaspams per second and you’ve got to deal with them before you have the entire world breathing down your neck asking why you’re clogging up the works.

The simplest job for such a piece of equipment is packet forwarding. Specifically, forwarding up through layer 4 with version 4 of the IP protocol. Five years ago, folks were futzing with 10 gigabits per second, trying to get core routers to work at 40 Gbps.

Then IPv6 came along, and forwarding equipment had to do both. And v6 is harder than v4, and you might have v4 packets tunneling under v6, and, well, it made life more complicated.

But that’s all child’s play compared to what’s being demanded now. These boxes now have to be application aware – that’s layer 7 – and, by the way, do it at 100 Gbps (according to Cavium). So you’re getting packets, figuring out where they need to go according to a couple of different protocols, keeping track of flows and sessions, peering deep inside to see what application they belong to so that you can manage and prioritize traffic and, oh, while you’re up, see if anything suspicious is going on inside. For starters, anyway.

Which creates an enormous demand for computing power and I/O bandwidth. Computing power, can, of course, be added by upping the number of cores, at least in theory. And at some point, multicore becomes many-core (although it’s not clear just where).

But many of the required compute-intensive functions are relatively well defined and can benefit from implementation in hardware. Encryption and decryption have been hardened into silicon for a long time. Other functions, like Cavium’s NEURON search chips, have been provided separately, to operate alongside the processor.

Cavium has already flirted with moving external accompaniments onto their processor chips with their OCTEON Fusion products for base stations. With their recent announcement of OCTEON III, the integration continues. We now enter many-core territory, but with an architecture firmly directed at communications infrastructure.

The computing side of things involves up to 48 cores per chip, each humming at up to 2.5 GHz. They compare this raw capacity as being 1.6 – 2 times higher than that of OCTEON II. They’ve tweaked some instructions and almost doubled the instruction cache size; they’ve maintained their ability to get data and put it to use in 3 cycles; and they’ve made various improvements in their ability to execute speculatively.

They’re being a bit cagier on the size of the L2 cache (“very large”), which is shared across all cores; they can access as much as 256 GB of external memory. But… they bring out their coherency infrastructure – their so-called OCI bus – so that you can gang up to 8 chips together (that’s 8×48 cores) and have them act as a single processor, with full coherency maintained across all eight. They claim the total compute power to be 960 GHz or 800 Gbps of application processing (that latter being a rough number, presumably, since not all bits are created – or processed – alike). Oh, and 2 TB of memory.

That’s the raw software side of things. Then come the accelerators. These are hardware implementations of frequently-used compute-intensive functions. Note that one of them, which they’ve had around for a long time, is particular to the general tendency of communications processing to use pipelines and schedule different operations on different pipelines (or even simply on dedicated cores). That scheduling itself becomes laborious, and it is the role of their Application Acceleration Manager.

In addition, packets are processed as they arrive (things like decapsulation) and as they leave through dedicated Packet Input and Packet Output hardware. DMA, RAID, and other things are accelerated as well.

But they’re touting in particular the integration of some much beefier functions. One of them is deep packet inspection (DPI). They’ve put up to 64 dedicated engines in there for DPI – so-called “Hyper-Finite-Automata” (HFA) engines, if you feel the need for a mouthful. They’ve had these in their OCTEON II family; now they say they can handle DPI at up to 100 Gbps with very flexible, complex rules.

They’ve also integrated their NEURON search engine on-chip. Essentially, this eliminates the need for an external TCAM for most applications, supporting access control lists and longest-prefix-match searches internally at – you guessed it – 100 Gbps.

Finally, they can handle over 100 Gbps of encryption and decryption and 50 Gbps of compression and decompression (for accessing the guts of zipped files, for example).

They’ve jumped all the way down to 28 nm with this family, so power becomes an important consideration. They say that they’ve implemented dynamic clock gating throughout as well as power gating to shut off unused circuits. They can also dynamically throttle performance. In fact, if a core is not in use, rather than putting it completely to sleep – which would require a relatively lengthy wake-up when needed – they simply ratchet down the clock frequency to something negligible so they can get it up and running again in 3 clock cycles.

There are a number of triggers that can affect the power management, and they can be accessed in software so that users can take their best shot at crafting power savings. Overall, they say that they can get four times the performance of OCTEON II within the same power envelope.

They also have dedicated hardware to manage secure booting and their “Authentik” feature, which, more or less, ensures that the chip you bought is Cavium-approved (rather than something that was, for example, over-built and then sold on the gray market).

And, of course, they have lots of I/O bandwidth. They claim 500 Gbps, with over double the connectivity of OCTEON II.

All in all, it’s a monster chip, operating in the realms that the many-core guys have been claiming. But the heavy focus on accelerators keeps the chip firmly rooted in communications, even if the cores could theoretically be put to use on more general computation. Samples are supposed to be available in the second half of this year; datasheets are already available to key customers (much Cavium detail is available only under NDA). So it will be some time before these beasts will be processing real traffic.

But, if they do their jobs right, you should expect them to show up in an enterprise, ISP, cellular, security, or cloud computing infrastructure box near you.

Image: Gwillhickers/Wikipedia

14 thoughts on “A Bigger Packet Pipe”

  1. Hi Byron,

    Very informative article. Do you know of anyone who’s built a card, blade or appliance utilizing the HFA engines?

    Regards,

    Drew

  2. Pingback: GVK Biosciences
  3. Pingback: Bdsm
  4. Pingback: DMPK Services
  5. Pingback: Boliden
  6. Pingback: scr888
  7. Pingback: iraqi coehuman

Leave a Reply

featured blogs
Apr 19, 2024
In today's rapidly evolving digital landscape, staying at the cutting edge is crucial to success. For MaxLinear, bridging the gap between firmware and hardware development has been pivotal. All of the company's products solve critical communication and high-frequency analysis...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...
Apr 18, 2024
See how Cisco accelerates library characterization and chip design with our cloud EDA tools, scaling access to SoC validation solutions and compute services.The post Cisco Accelerates Project Schedule by 66% Using Synopsys Cloud appeared first on Chip Design....

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured chalk talk

Achieving High Power Density with IGBT and SiC Power Modules
Sponsored by Mouser Electronics and Infineon
Recent trends in the inverter market have made high power density, scalability, and ease of assembly more important than ever before. In this episode of Chalk Talk, Amelia Dalton and Abraham Markose from Infineon examine how Easy & Econo power modules from Infineon can help solve common inverter design requirements. They explore the benefits and construction of these modules and how you can take advantage of them in your next design.
May 19, 2023
36,883 views