feature article
Subscribe Now

The Match Game

Netlogic Speeds DPI by Accelerating Text Pattern Matching

Chester was a real stickler for grammar. It started innocently enough: he would review his own memos a couple of extra times to make sure they were right. Then he started cracking down on his staff: he wanted them all to be as careful about their prose as he was about his. And he was reviewing their stuff. And that worked, more or less.

But the problem was, he was kind of OCD about everything he read. A memo might come in describing an incredible new bonus program that was going to net him thousands of dollars, but misplaced commas would distract him, and he would completely miss the main message. He seemed to be able to comprehend only grammatically pure materials. He just couldn’t let it go.

So one day he decided he’d had enough. It was one thing for him to clean up the work of his staff, but now he decided he was done reading other people’s shoddy stuff. He directed that all memos and all emails be reviewed by his admin before being sent to him. He compiled, off the top of his head, a list of rules. And he would accompany some of them with a rant to illustrate why the rule mattered.

A typical example might be, “The word ‘only’ should be placed only in front of the thing that it modifies. I just read an email where it says, ‘It will only take a minute.’ That is incorrect usage of the word ‘only.’ If I read that right, it’s saying it will only take – not give or borrow or donate or fricassee – a minute. That makes no sense! What it should say is, ‘It will take only a minute.’ Not an hour, not a picosecond; a minute. THIS STUFF MATTERS, PEOPLE!!”*

Of course, there were only so many rules that he could come up with at a time. So, as emails got to him with problems not covered in the rules, he would make a note and add them to the rule set. He figured that, at some point, it would be water-tight, and he’d never again have to stumble over anything that broke his rules. If an email or memo came in and failed the test, his admin sent it back for correction: it would never get to his desk until clean.

This had an immediate effect on his workload: because he was no longer burdened with reviewing documents, he had far more time on his hands. In fact, things got easier and easier, and he was feeling rather pleased with his newfound liberation. Until he noticed his admin’s desk and computer desktop: piles of memos and hundreds of emails were stacked up awaiting review. It wasn’t that he no longer had as much work to do; it was that everything was stuck in grammar review, and his admin couldn’t keep up.

It was only when he missed a mandatory corporate strategic offsite meeting (the invitation had asked attendees “… to please be prompt”, and Chester had decided that, grammarians’ disagreements on the point notwithstanding, split infinitives were an evil up with which he would not put) that he decided that he needed to accelerate the grammar rule-checking process.


Deep packet inspection (DPI) is the unglamorous process of peering into packets public and private to make sure that there’s nothing problematic lurking in there. “Problematic” typically refers to evil things like viruses and malware and Trojan horses (although there’s nothing to say that it couldn’t be extended to include pejorative comments about a government or company).

We recently looked at one aspect of the process of DPI, Netronome’s notion of flow processing. However, we really looked at acceleration of DPI by managing the rest of the process – flow processing, in this case. But we didn’t deal with the actual deep inspection of packets.

We also took a brief look at Snort, an open-source rule-processing engine. But that’s only one particular engine, and its complexity is limited by constraining the kinds of rules that can be expressed. One can formulate more complex search patterns than Snort can handle, but then the pattern-matching engine must also be more sophisticated.

You may recall that rules tend to consist of two parts: a pattern to match and then an action to take based on a match. You search for text having a particular characteristic and, if you find it, then you do something – and that something will depend on what’s being searched.

You take the action only if a match happens, which, hopefully, isn’t too often. But there are thousands and tens of thousands of possible things to look for to decide if a packet is good. Having to run all those rules takes time, but the performance issue isn’t with the action – a host processor can probably handle that; the problem is with all the string matching patterns from all the rules.

String matching is probably one of the exercises you used for your first software state machines in your undergrad programming course. There’s actually an intimidating name for the kind of state machine that parses “regular languages” or “regular expressions” – including, in particular, the apparently somewhat misnamed “Perl-compatible regular expressions (PCREs)”: a deterministic finite automaton (DFA). Every home should have one.

There’s a whole body of mathematics behind this regular expression thing that I won’t even attempt to plumb here. (Because I’d have to understand it first.) Put simply, they are a way of expressing strings in a search – they’re the “re” in “grep,” one of Unix’s typically opaque commands, this one meaning, more or less, “find.” (Why use a simple common word when a made-up one will do?) The bottom line of this is that you can compile a set of string search patterns into a tree that can be processed by a DFA.

Netlogic has taken this approach one step further with their NETL7 family of what they call “knowledge-based processors (KBPs).” (They also have a Sahasra family of KBPs, but they’re very different.) They’ve integrated their own enhanced DFA, which they call their Intelligent Fabric for Automata (IFA), into a dedicated chip. Actually, they’ve integrated around 10 per chip (their Mike Ichiriu, VP of Systems and Applications Engineering, kept the exact number somewhat vague).

The “fabric for automata” nomer makes sense: it’s not like there’s one set of hard and fast rules that can be cast into hardware via a dedicated state machine. The rules are forever changing, so any attempt to deal with this must allow for any state machines – or automata – within the defined scope to be implemented in the KBP fabric.

The KBP consists of logic and memory, including some packet buffering memory. The engine itself is very tightly coupled with internal memory for the stored patterns. The KBPs store only the pattern-matching part of the rule, not the action portion.

You can stream a packet through the engine either by passing a pointer to the packet or by actually encapsulating the packet in an “instruction” that gets sent to the KBP. You can also check content across packet boundaries, since most long messages end up being fractured into multiple packets. This avoids the chance that something untoward sneak in with its head in one packet and its tail in the next.

Companies can add their own rules – and, in this case, it’s not typically going to be the system builder that adds rules, but the service provider using the system. So the mechanism has to be particularly straightforward, because it’s several degrees removed from anyone familiar with the dirty details of DPI.

The idea is that these chips can accompany the packet-processing chips that manage the actual network traffic. DPI would normally be done by host processors in the “slow path,” since the dedicated packet or flow processors in the “fast path” can’t do it, given their laser-like focus on packet routing. But if practically every packet must be scanned, one could almost argue that DPI needs to be added to the fast path.

Typically, however, packets are sent out of the traditional fast path for checking, since there’s no host-style processor in the fast path. Offloading is intended to make that portion of the slow path faster. The dedicated engine can process the rules at rates ranging from 250 Mbps to 20 Gbps (depending on the device), much more quickly than a straight software implementation would be able to.

Given a set of rules for English grammar, that might even be even fast enough to process all of Chester’s incoming emails and memos. Except for one problem: the rules for the English language are anything but regular…


*Full disclosure: I violate this rule in my drafts all the time. Our editor apprised me of the rule, and I kept failing it so often that I started doing my own “only” scans before submitting…

Leave a Reply

featured blogs
Apr 14, 2021
You put your design through a multitude of tools for various transformations. Going back to formal verification in between every change to rely on your simulation tools can be a rigorous approach,... [[ Click on the title to access the full blog on the Cadence Community site...
Apr 14, 2021
Hybrid Cloud architecture enables innovation in AI chip design; learn how our partnership with IBM combines the best in EDA & HPC to improve AI performance. The post Synopsys and IBM Research: Driving Real Progress in Large-Scale AI Silicon and Implementing a Hybrid Clou...
Apr 13, 2021
The human brain is very good at understanding the world around us.  An everyday example can be found when driving a car.  An experienced driver will be able to judge how large their car is, and how close they can approach an obstacle.  The driver does not need ...
Apr 12, 2021
The Semiconductor Ecosystem- It is the definition of 'High Tech', but it isn't just about… The post Calibre and the Semiconductor Ecosystem appeared first on Design with Calibre....

featured video

Meeting Cloud Data Bandwidth Requirements with HPC IP

Sponsored by Synopsys

As people continue to work remotely, demands on cloud data centers have never been higher. Chip designers for high-performance computing (HPC) SoCs are looking to new and innovative IP to meet their bandwidth, capacity, and security needs.

Click here for more information

featured paper

Understanding Functional Safety FIT Base Failure Rate Estimates per IEC 62380 and SN 29500

Sponsored by Texas Instruments

Functional safety standards such as IEC 61508 and ISO 26262 require semiconductor device manufacturers to address both systematic and random hardware failures. Base failure rates (BFR) quantify the intrinsic reliability of the semiconductor component while operating under normal environmental conditions. Download our white paper which focuses on two widely accepted techniques to estimate the BFR for semiconductor components; estimates per IEC Technical Report 62380 and SN 29500 respectively.

Click here to download the whitepaper

featured chalk talk

Thunderbolt Technology Overview

Sponsored by Mouser Electronics and Intel

Thunderbolt is the closest thing we’ve got to universal interconnect between a wide variety of devices and systems. With a universal USB-C connector, it can do video, power, data communication - all at scalable rates with smart adjustment. In this episode of Chalk Talk, Amelia Dalton chats with Sandeep Vedanthi of Intel about the latest in Thunderbolt technology - Thunderbolt 4, which brings a number of benefits over previous versions.

Click here for more information about Intel 8000 series Thunderbolt™ 4 Controllers