feature article
Subscribe Now

A Bigger Packet Pipe

Cavium Announces OCTEON III

Multicore is familiar territory in the communications world. That application area is arguably where the most sophisticated multicore practitioners operate. Unlike other embedded areas, like cellphones – if, that is, you consider them embedded, and which are just starting to use more than one core in a real multicore way, communications infrastructure designers have been using several cores for a long time.

The reason is simple: speed. You’ve got packets flying at you at a bazillion gigaspams per second and you’ve got to deal with them before you have the entire world breathing down your neck asking why you’re clogging up the works.

The simplest job for such a piece of equipment is packet forwarding. Specifically, forwarding up through layer 4 with version 4 of the IP protocol. Five years ago, folks were futzing with 10 gigabits per second, trying to get core routers to work at 40 Gbps.

Then IPv6 came along, and forwarding equipment had to do both. And v6 is harder than v4, and you might have v4 packets tunneling under v6, and, well, it made life more complicated.

But that’s all child’s play compared to what’s being demanded now. These boxes now have to be application aware – that’s layer 7 – and, by the way, do it at 100 Gbps (according to Cavium). So you’re getting packets, figuring out where they need to go according to a couple of different protocols, keeping track of flows and sessions, peering deep inside to see what application they belong to so that you can manage and prioritize traffic and, oh, while you’re up, see if anything suspicious is going on inside. For starters, anyway.

Which creates an enormous demand for computing power and I/O bandwidth. Computing power, can, of course, be added by upping the number of cores, at least in theory. And at some point, multicore becomes many-core (although it’s not clear just where).

But many of the required compute-intensive functions are relatively well defined and can benefit from implementation in hardware. Encryption and decryption have been hardened into silicon for a long time. Other functions, like Cavium’s NEURON search chips, have been provided separately, to operate alongside the processor.

Cavium has already flirted with moving external accompaniments onto their processor chips with their OCTEON Fusion products for base stations. With their recent announcement of OCTEON III, the integration continues. We now enter many-core territory, but with an architecture firmly directed at communications infrastructure.

The computing side of things involves up to 48 cores per chip, each humming at up to 2.5 GHz. They compare this raw capacity as being 1.6 – 2 times higher than that of OCTEON II. They’ve tweaked some instructions and almost doubled the instruction cache size; they’ve maintained their ability to get data and put it to use in 3 cycles; and they’ve made various improvements in their ability to execute speculatively.

They’re being a bit cagier on the size of the L2 cache (“very large”), which is shared across all cores; they can access as much as 256 GB of external memory. But… they bring out their coherency infrastructure – their so-called OCI bus – so that you can gang up to 8 chips together (that’s 8×48 cores) and have them act as a single processor, with full coherency maintained across all eight. They claim the total compute power to be 960 GHz or 800 Gbps of application processing (that latter being a rough number, presumably, since not all bits are created – or processed – alike). Oh, and 2 TB of memory.

That’s the raw software side of things. Then come the accelerators. These are hardware implementations of frequently-used compute-intensive functions. Note that one of them, which they’ve had around for a long time, is particular to the general tendency of communications processing to use pipelines and schedule different operations on different pipelines (or even simply on dedicated cores). That scheduling itself becomes laborious, and it is the role of their Application Acceleration Manager.

In addition, packets are processed as they arrive (things like decapsulation) and as they leave through dedicated Packet Input and Packet Output hardware. DMA, RAID, and other things are accelerated as well.

But they’re touting in particular the integration of some much beefier functions. One of them is deep packet inspection (DPI). They’ve put up to 64 dedicated engines in there for DPI – so-called “Hyper-Finite-Automata” (HFA) engines, if you feel the need for a mouthful. They’ve had these in their OCTEON II family; now they say they can handle DPI at up to 100 Gbps with very flexible, complex rules.

They’ve also integrated their NEURON search engine on-chip. Essentially, this eliminates the need for an external TCAM for most applications, supporting access control lists and longest-prefix-match searches internally at – you guessed it – 100 Gbps.

Finally, they can handle over 100 Gbps of encryption and decryption and 50 Gbps of compression and decompression (for accessing the guts of zipped files, for example).

They’ve jumped all the way down to 28 nm with this family, so power becomes an important consideration. They say that they’ve implemented dynamic clock gating throughout as well as power gating to shut off unused circuits. They can also dynamically throttle performance. In fact, if a core is not in use, rather than putting it completely to sleep – which would require a relatively lengthy wake-up when needed – they simply ratchet down the clock frequency to something negligible so they can get it up and running again in 3 clock cycles.

There are a number of triggers that can affect the power management, and they can be accessed in software so that users can take their best shot at crafting power savings. Overall, they say that they can get four times the performance of OCTEON II within the same power envelope.

They also have dedicated hardware to manage secure booting and their “Authentik” feature, which, more or less, ensures that the chip you bought is Cavium-approved (rather than something that was, for example, over-built and then sold on the gray market).

And, of course, they have lots of I/O bandwidth. They claim 500 Gbps, with over double the connectivity of OCTEON II.

All in all, it’s a monster chip, operating in the realms that the many-core guys have been claiming. But the heavy focus on accelerators keeps the chip firmly rooted in communications, even if the cores could theoretically be put to use on more general computation. Samples are supposed to be available in the second half of this year; datasheets are already available to key customers (much Cavium detail is available only under NDA). So it will be some time before these beasts will be processing real traffic.

But, if they do their jobs right, you should expect them to show up in an enterprise, ISP, cellular, security, or cloud computing infrastructure box near you.

Image: Gwillhickers/Wikipedia

14 thoughts on “A Bigger Packet Pipe”

  1. Hi Byron,

    Very informative article. Do you know of anyone who’s built a card, blade or appliance utilizing the HFA engines?

    Regards,

    Drew

  2. Pingback: GVK Biosciences
  3. Pingback: Bdsm
  4. Pingback: DMPK Services
  5. Pingback: Boliden
  6. Pingback: scr888
  7. Pingback: iraqi coehuman

Leave a Reply

featured blogs
Jul 20, 2024
If you are looking for great technology-related reads, here are some offerings that I cannot recommend highly enough....

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

Sponsored by Cadence Design Systems

Larsen & Toubro built the world’s largest FIFA stadium in Qatar, the world’s tallest statue, and one of the world’s most sophisticated cricket stadiums. Their latest business venture? Designing data centers. Since IT equipment in data centers generates a lot of heat, it’s important to have an efficient and effective cooling system. Learn why, Larsen & Toubro use Cadence Reality DC Design Software for simulation and analysis of the cooling system.

Click here for more information about Cadence Multiphysics System Analysis

featured paper

DNA of a Modern Mid-Range FPGA

Sponsored by Intel

While it is tempting to classify FPGAs simply based on logic capacity, modern FPGAs are alterable systems on chips with a wide variety of features and resources. In this blog we look closer at requirements of the mid-range segment of the FPGA industry.

Click here to read DNA of a Modern Mid-Range FPGA - Intel Community

featured chalk talk

Reliable Connections for Rugged Handling
Sponsored by Mouser Electronics and Amphenol
Materials handling is a growing market for electronic designs. In this episode of Chalk Talk, Amelia Dalton and Jordan Grupe from Amphenol Industrial explore the variety of connectivity solutions that Amphenol Industrial offers for materials handling designs. They also examine the DIN charging solutions that Amphenol Industrial offers and the specific applications where these connectors can be a great fit.
Dec 5, 2023
29,000 views