feature article
Subscribe Now

A Bigger Packet Pipe

Cavium Announces OCTEON III

Multicore is familiar territory in the communications world. That application area is arguably where the most sophisticated multicore practitioners operate. Unlike other embedded areas, like cellphones – if, that is, you consider them embedded, and which are just starting to use more than one core in a real multicore way, communications infrastructure designers have been using several cores for a long time.

The reason is simple: speed. You’ve got packets flying at you at a bazillion gigaspams per second and you’ve got to deal with them before you have the entire world breathing down your neck asking why you’re clogging up the works.

The simplest job for such a piece of equipment is packet forwarding. Specifically, forwarding up through layer 4 with version 4 of the IP protocol. Five years ago, folks were futzing with 10 gigabits per second, trying to get core routers to work at 40 Gbps.

Then IPv6 came along, and forwarding equipment had to do both. And v6 is harder than v4, and you might have v4 packets tunneling under v6, and, well, it made life more complicated.

But that’s all child’s play compared to what’s being demanded now. These boxes now have to be application aware – that’s layer 7 – and, by the way, do it at 100 Gbps (according to Cavium). So you’re getting packets, figuring out where they need to go according to a couple of different protocols, keeping track of flows and sessions, peering deep inside to see what application they belong to so that you can manage and prioritize traffic and, oh, while you’re up, see if anything suspicious is going on inside. For starters, anyway.

Which creates an enormous demand for computing power and I/O bandwidth. Computing power, can, of course, be added by upping the number of cores, at least in theory. And at some point, multicore becomes many-core (although it’s not clear just where).

But many of the required compute-intensive functions are relatively well defined and can benefit from implementation in hardware. Encryption and decryption have been hardened into silicon for a long time. Other functions, like Cavium’s NEURON search chips, have been provided separately, to operate alongside the processor.

Cavium has already flirted with moving external accompaniments onto their processor chips with their OCTEON Fusion products for base stations. With their recent announcement of OCTEON III, the integration continues. We now enter many-core territory, but with an architecture firmly directed at communications infrastructure.

The computing side of things involves up to 48 cores per chip, each humming at up to 2.5 GHz. They compare this raw capacity as being 1.6 – 2 times higher than that of OCTEON II. They’ve tweaked some instructions and almost doubled the instruction cache size; they’ve maintained their ability to get data and put it to use in 3 cycles; and they’ve made various improvements in their ability to execute speculatively.

They’re being a bit cagier on the size of the L2 cache (“very large”), which is shared across all cores; they can access as much as 256 GB of external memory. But… they bring out their coherency infrastructure – their so-called OCI bus – so that you can gang up to 8 chips together (that’s 8×48 cores) and have them act as a single processor, with full coherency maintained across all eight. They claim the total compute power to be 960 GHz or 800 Gbps of application processing (that latter being a rough number, presumably, since not all bits are created – or processed – alike). Oh, and 2 TB of memory.

That’s the raw software side of things. Then come the accelerators. These are hardware implementations of frequently-used compute-intensive functions. Note that one of them, which they’ve had around for a long time, is particular to the general tendency of communications processing to use pipelines and schedule different operations on different pipelines (or even simply on dedicated cores). That scheduling itself becomes laborious, and it is the role of their Application Acceleration Manager.

In addition, packets are processed as they arrive (things like decapsulation) and as they leave through dedicated Packet Input and Packet Output hardware. DMA, RAID, and other things are accelerated as well.

But they’re touting in particular the integration of some much beefier functions. One of them is deep packet inspection (DPI). They’ve put up to 64 dedicated engines in there for DPI – so-called “Hyper-Finite-Automata” (HFA) engines, if you feel the need for a mouthful. They’ve had these in their OCTEON II family; now they say they can handle DPI at up to 100 Gbps with very flexible, complex rules.

They’ve also integrated their NEURON search engine on-chip. Essentially, this eliminates the need for an external TCAM for most applications, supporting access control lists and longest-prefix-match searches internally at – you guessed it – 100 Gbps.

Finally, they can handle over 100 Gbps of encryption and decryption and 50 Gbps of compression and decompression (for accessing the guts of zipped files, for example).

They’ve jumped all the way down to 28 nm with this family, so power becomes an important consideration. They say that they’ve implemented dynamic clock gating throughout as well as power gating to shut off unused circuits. They can also dynamically throttle performance. In fact, if a core is not in use, rather than putting it completely to sleep – which would require a relatively lengthy wake-up when needed – they simply ratchet down the clock frequency to something negligible so they can get it up and running again in 3 clock cycles.

There are a number of triggers that can affect the power management, and they can be accessed in software so that users can take their best shot at crafting power savings. Overall, they say that they can get four times the performance of OCTEON II within the same power envelope.

They also have dedicated hardware to manage secure booting and their “Authentik” feature, which, more or less, ensures that the chip you bought is Cavium-approved (rather than something that was, for example, over-built and then sold on the gray market).

And, of course, they have lots of I/O bandwidth. They claim 500 Gbps, with over double the connectivity of OCTEON II.

All in all, it’s a monster chip, operating in the realms that the many-core guys have been claiming. But the heavy focus on accelerators keeps the chip firmly rooted in communications, even if the cores could theoretically be put to use on more general computation. Samples are supposed to be available in the second half of this year; datasheets are already available to key customers (much Cavium detail is available only under NDA). So it will be some time before these beasts will be processing real traffic.

But, if they do their jobs right, you should expect them to show up in an enterprise, ISP, cellular, security, or cloud computing infrastructure box near you.

Image: Gwillhickers/Wikipedia

14 thoughts on “A Bigger Packet Pipe”

  1. Hi Byron,

    Very informative article. Do you know of anyone who’s built a card, blade or appliance utilizing the HFA engines?

    Regards,

    Drew

  2. Pingback: GVK Biosciences
  3. Pingback: Bdsm
  4. Pingback: DMPK Services
  5. Pingback: Boliden
  6. Pingback: scr888
  7. Pingback: iraqi coehuman

Leave a Reply

featured blogs
Sep 18, 2021
Projects with a steampunk look-and-feel incorporate retro-futuristic technology and aesthetics inspired by 19th-century industrial steam-powered machinery....
Sep 17, 2021
Dear BoardSurfers, I want to unapologetically hijack the normal news and exciting feature information that you are accustomed to reading about in the world of PCB Design blogs to eagerly let you know... [[ Click on the title to access the full blog on the Cadence Community s...
Sep 15, 2021
Learn how chiplets form the basis of multi-die HPC processor architectures, fueling modern HPC applications and scaling performance & power beyond Moore's Law. The post What's Driving the Demand for Chiplets? appeared first on From Silicon To Software....
Aug 5, 2021
Megh Computing's Video Analytics Solution (VAS) portfolio implements a flexible and scalable video analytics pipeline consisting of the following elements: Video Ingestion Video Transformation Object Detection and Inference Video Analytics Visualization   Because Megh's ...

featured video

Product Update: Complete DesignWare 400G/800G Ethernet IP

Sponsored by Synopsys

In this video product experts describe how designers can maximize the performance of their high-performance computing, AI and networking SoCs with Synopsys' complete DesignWare Ethernet 400G/800G IP solution, including MAC, PCS and PHY.

Click here for more information

featured paper

Seamlessly connect your world with 16 new wireless MCUs for the 2.4-GHz and Sub-1-GHz bands

Sponsored by Texas Instruments

Low-power wireless microcontroller (MCU) shipments are expected to double over the next four years to more than 4 billion units. This massive influx of MCUs will result in more opportunities for wireless connectivity than ever before, with growth across a wide range of applications and technologies. With the addition of 16 new wireless connectivity devices, we are empowering you to innovate, scale and accelerate the deployment of wireless connectivity – no matter what or how you are connecting.

Click to read more

featured chalk talk

Traveo II Microcontrollers for Automotive Solutions

Sponsored by Mouser Electronics and Infineon

Today’s automotive designs are more complicated than ever, with a slew of safety requirements, internal memory considerations, and complicated power issues to consider. In this episode of Chalk Talk, Amelia Dalton chats with Marcelo Williams Silva from Infineon about the Traveo™ II Microcontrollers that deal with all of these automotive-related challenges with ease. Amelia and Marcelo take a closer look at how the power efficiency, smart IO signal paths, and over the air firmware updates included with this new MCU family will make all the time-saving difference in your next automotive design.

Click here for more information about Cypress Semiconductor Traveo™ II 32-bit Arm Automotive MCUs