feature article
Subscribe Now

A Bigger Packet Pipe

Cavium Announces OCTEON III

Multicore is familiar territory in the communications world. That application area is arguably where the most sophisticated multicore practitioners operate. Unlike other embedded areas, like cellphones – if, that is, you consider them embedded, and which are just starting to use more than one core in a real multicore way, communications infrastructure designers have been using several cores for a long time.

The reason is simple: speed. You’ve got packets flying at you at a bazillion gigaspams per second and you’ve got to deal with them before you have the entire world breathing down your neck asking why you’re clogging up the works.

The simplest job for such a piece of equipment is packet forwarding. Specifically, forwarding up through layer 4 with version 4 of the IP protocol. Five years ago, folks were futzing with 10 gigabits per second, trying to get core routers to work at 40 Gbps.

Then IPv6 came along, and forwarding equipment had to do both. And v6 is harder than v4, and you might have v4 packets tunneling under v6, and, well, it made life more complicated.

But that’s all child’s play compared to what’s being demanded now. These boxes now have to be application aware – that’s layer 7 – and, by the way, do it at 100 Gbps (according to Cavium). So you’re getting packets, figuring out where they need to go according to a couple of different protocols, keeping track of flows and sessions, peering deep inside to see what application they belong to so that you can manage and prioritize traffic and, oh, while you’re up, see if anything suspicious is going on inside. For starters, anyway.

Which creates an enormous demand for computing power and I/O bandwidth. Computing power, can, of course, be added by upping the number of cores, at least in theory. And at some point, multicore becomes many-core (although it’s not clear just where).

But many of the required compute-intensive functions are relatively well defined and can benefit from implementation in hardware. Encryption and decryption have been hardened into silicon for a long time. Other functions, like Cavium’s NEURON search chips, have been provided separately, to operate alongside the processor.

Cavium has already flirted with moving external accompaniments onto their processor chips with their OCTEON Fusion products for base stations. With their recent announcement of OCTEON III, the integration continues. We now enter many-core territory, but with an architecture firmly directed at communications infrastructure.

The computing side of things involves up to 48 cores per chip, each humming at up to 2.5 GHz. They compare this raw capacity as being 1.6 – 2 times higher than that of OCTEON II. They’ve tweaked some instructions and almost doubled the instruction cache size; they’ve maintained their ability to get data and put it to use in 3 cycles; and they’ve made various improvements in their ability to execute speculatively.

They’re being a bit cagier on the size of the L2 cache (“very large”), which is shared across all cores; they can access as much as 256 GB of external memory. But… they bring out their coherency infrastructure – their so-called OCI bus – so that you can gang up to 8 chips together (that’s 8×48 cores) and have them act as a single processor, with full coherency maintained across all eight. They claim the total compute power to be 960 GHz or 800 Gbps of application processing (that latter being a rough number, presumably, since not all bits are created – or processed – alike). Oh, and 2 TB of memory.

That’s the raw software side of things. Then come the accelerators. These are hardware implementations of frequently-used compute-intensive functions. Note that one of them, which they’ve had around for a long time, is particular to the general tendency of communications processing to use pipelines and schedule different operations on different pipelines (or even simply on dedicated cores). That scheduling itself becomes laborious, and it is the role of their Application Acceleration Manager.

In addition, packets are processed as they arrive (things like decapsulation) and as they leave through dedicated Packet Input and Packet Output hardware. DMA, RAID, and other things are accelerated as well.

But they’re touting in particular the integration of some much beefier functions. One of them is deep packet inspection (DPI). They’ve put up to 64 dedicated engines in there for DPI – so-called “Hyper-Finite-Automata” (HFA) engines, if you feel the need for a mouthful. They’ve had these in their OCTEON II family; now they say they can handle DPI at up to 100 Gbps with very flexible, complex rules.

They’ve also integrated their NEURON search engine on-chip. Essentially, this eliminates the need for an external TCAM for most applications, supporting access control lists and longest-prefix-match searches internally at – you guessed it – 100 Gbps.

Finally, they can handle over 100 Gbps of encryption and decryption and 50 Gbps of compression and decompression (for accessing the guts of zipped files, for example).

They’ve jumped all the way down to 28 nm with this family, so power becomes an important consideration. They say that they’ve implemented dynamic clock gating throughout as well as power gating to shut off unused circuits. They can also dynamically throttle performance. In fact, if a core is not in use, rather than putting it completely to sleep – which would require a relatively lengthy wake-up when needed – they simply ratchet down the clock frequency to something negligible so they can get it up and running again in 3 clock cycles.

There are a number of triggers that can affect the power management, and they can be accessed in software so that users can take their best shot at crafting power savings. Overall, they say that they can get four times the performance of OCTEON II within the same power envelope.

They also have dedicated hardware to manage secure booting and their “Authentik” feature, which, more or less, ensures that the chip you bought is Cavium-approved (rather than something that was, for example, over-built and then sold on the gray market).

And, of course, they have lots of I/O bandwidth. They claim 500 Gbps, with over double the connectivity of OCTEON II.

All in all, it’s a monster chip, operating in the realms that the many-core guys have been claiming. But the heavy focus on accelerators keeps the chip firmly rooted in communications, even if the cores could theoretically be put to use on more general computation. Samples are supposed to be available in the second half of this year; datasheets are already available to key customers (much Cavium detail is available only under NDA). So it will be some time before these beasts will be processing real traffic.

But, if they do their jobs right, you should expect them to show up in an enterprise, ISP, cellular, security, or cloud computing infrastructure box near you.

Image: Gwillhickers/Wikipedia

14 thoughts on “A Bigger Packet Pipe”

  1. Hi Byron,

    Very informative article. Do you know of anyone who’s built a card, blade or appliance utilizing the HFA engines?

    Regards,

    Drew

  2. Pingback: GVK Biosciences
  3. Pingback: Bdsm
  4. Pingback: DMPK Services
  5. Pingback: Boliden
  6. Pingback: scr888
  7. Pingback: iraqi coehuman

Leave a Reply

featured blogs
May 14, 2021
Another Friday, another week chock full of CFD, CAE, and CAD news. This week features a topic near and dear to my heart involving death of the rainbow color map for displaying simulation results.... [[ Click on the title to access the full blog on the Cadence Community site....
May 13, 2021
Samtec will attend the PCI-SIG Virtual Developers Conference on Tuesday, May 25th through Wednesday, May 26th, 2021. This is a free event for the 800+ member companies that develop and bring to market new products utilizing PCI Express technology. Attendee Registration is sti...
May 13, 2021
Our new IC design tool, PrimeSim Continuum, enables the next generation of hyper-convergent IC designs. Learn more from eeNews, Electronic Design & EE Times. The post Synopsys Makes Headlines with PrimeSim Continuum, an Innovative Circuit Simulation Solution appeared fi...
May 13, 2021
By Calibre Design Staff Prior to the availability of extreme ultraviolet (EUV) lithography, multi-patterning provided… The post A SAMPle of what you need to know about SAMP technology appeared first on Design with Calibre....

featured video

Introduction to EMI

Sponsored by Texas Instruments

Conducted versus radiated EMI. CISPR-25 and CISPR-32 standards. High-frequency or low-frequency emissions. Designing a system to reduce EMI can be overwhelming, but it doesn’t have to be. Watch this video to get an overview of EMI causes, standards, and mitigation techniques.

Click here for more information

featured paper

IP Solutions for a Data-Centric World

Sponsored by Cadence Design Systems

High-performance computing, data communications, networking, and storage systems are taking center stage in many application areas, driven by newer applications such as analytics, artificial intelligence (AI), genomics, and simulation-intensive workloads. Power efficiency, high performance, and small form factor are key requirements for such systems. This paper examines how Cadence’s pre-verified, standards-based design IP can help you deliver on your quality and time-to-market goals.

Click to read more

featured chalk talk

Fundamentals of ESD/TVS Protection

Sponsored by Mouser Electronics and Nexperia

ESD protection is a critical, and often overlooked design consideration in many of today’s systems. There is a wide variety of solutions available for ESD protection, and choosing the right one for your design can be a daunting and confusing task. In this episode of Chalk Talk, Amelia Dalton chats with Tom Wolf of Nexperia about choosing the right ESD protection for your next design.

Click here for more information about Nexperia PCMFxUSB3B/C - CMF EMI filters with ESD Protection