feature article
Subscribe Now

The Match Game

Netlogic Speeds DPI by Accelerating Text Pattern Matching

Chester was a real stickler for grammar. It started innocently enough: he would review his own memos a couple of extra times to make sure they were right. Then he started cracking down on his staff: he wanted them all to be as careful about their prose as he was about his. And he was reviewing their stuff. And that worked, more or less.

But the problem was, he was kind of OCD about everything he read. A memo might come in describing an incredible new bonus program that was going to net him thousands of dollars, but misplaced commas would distract him, and he would completely miss the main message. He seemed to be able to comprehend only grammatically pure materials. He just couldn’t let it go.

So one day he decided he’d had enough. It was one thing for him to clean up the work of his staff, but now he decided he was done reading other people’s shoddy stuff. He directed that all memos and all emails be reviewed by his admin before being sent to him. He compiled, off the top of his head, a list of rules. And he would accompany some of them with a rant to illustrate why the rule mattered.

A typical example might be, “The word ‘only’ should be placed only in front of the thing that it modifies. I just read an email where it says, ‘It will only take a minute.’ That is incorrect usage of the word ‘only.’ If I read that right, it’s saying it will only take – not give or borrow or donate or fricassee – a minute. That makes no sense! What it should say is, ‘It will take only a minute.’ Not an hour, not a picosecond; a minute. THIS STUFF MATTERS, PEOPLE!!”*

Of course, there were only so many rules that he could come up with at a time. So, as emails got to him with problems not covered in the rules, he would make a note and add them to the rule set. He figured that, at some point, it would be water-tight, and he’d never again have to stumble over anything that broke his rules. If an email or memo came in and failed the test, his admin sent it back for correction: it would never get to his desk until clean.

This had an immediate effect on his workload: because he was no longer burdened with reviewing documents, he had far more time on his hands. In fact, things got easier and easier, and he was feeling rather pleased with his newfound liberation. Until he noticed his admin’s desk and computer desktop: piles of memos and hundreds of emails were stacked up awaiting review. It wasn’t that he no longer had as much work to do; it was that everything was stuck in grammar review, and his admin couldn’t keep up.

It was only when he missed a mandatory corporate strategic offsite meeting (the invitation had asked attendees “… to please be prompt”, and Chester had decided that, grammarians’ disagreements on the point notwithstanding, split infinitives were an evil up with which he would not put) that he decided that he needed to accelerate the grammar rule-checking process.


Deep packet inspection (DPI) is the unglamorous process of peering into packets public and private to make sure that there’s nothing problematic lurking in there. “Problematic” typically refers to evil things like viruses and malware and Trojan horses (although there’s nothing to say that it couldn’t be extended to include pejorative comments about a government or company).

We recently looked at one aspect of the process of DPI, Netronome’s notion of flow processing. However, we really looked at acceleration of DPI by managing the rest of the process – flow processing, in this case. But we didn’t deal with the actual deep inspection of packets.

We also took a brief look at Snort, an open-source rule-processing engine. But that’s only one particular engine, and its complexity is limited by constraining the kinds of rules that can be expressed. One can formulate more complex search patterns than Snort can handle, but then the pattern-matching engine must also be more sophisticated.

You may recall that rules tend to consist of two parts: a pattern to match and then an action to take based on a match. You search for text having a particular characteristic and, if you find it, then you do something – and that something will depend on what’s being searched.

You take the action only if a match happens, which, hopefully, isn’t too often. But there are thousands and tens of thousands of possible things to look for to decide if a packet is good. Having to run all those rules takes time, but the performance issue isn’t with the action – a host processor can probably handle that; the problem is with all the string matching patterns from all the rules.

String matching is probably one of the exercises you used for your first software state machines in your undergrad programming course. There’s actually an intimidating name for the kind of state machine that parses “regular languages” or “regular expressions” – including, in particular, the apparently somewhat misnamed “Perl-compatible regular expressions (PCREs)”: a deterministic finite automaton (DFA). Every home should have one.

There’s a whole body of mathematics behind this regular expression thing that I won’t even attempt to plumb here. (Because I’d have to understand it first.) Put simply, they are a way of expressing strings in a search – they’re the “re” in “grep,” one of Unix’s typically opaque commands, this one meaning, more or less, “find.” (Why use a simple common word when a made-up one will do?) The bottom line of this is that you can compile a set of string search patterns into a tree that can be processed by a DFA.

Netlogic has taken this approach one step further with their NETL7 family of what they call “knowledge-based processors (KBPs).” (They also have a Sahasra family of KBPs, but they’re very different.) They’ve integrated their own enhanced DFA, which they call their Intelligent Fabric for Automata (IFA), into a dedicated chip. Actually, they’ve integrated around 10 per chip (their Mike Ichiriu, VP of Systems and Applications Engineering, kept the exact number somewhat vague).

The “fabric for automata” nomer makes sense: it’s not like there’s one set of hard and fast rules that can be cast into hardware via a dedicated state machine. The rules are forever changing, so any attempt to deal with this must allow for any state machines – or automata – within the defined scope to be implemented in the KBP fabric.

The KBP consists of logic and memory, including some packet buffering memory. The engine itself is very tightly coupled with internal memory for the stored patterns. The KBPs store only the pattern-matching part of the rule, not the action portion.

You can stream a packet through the engine either by passing a pointer to the packet or by actually encapsulating the packet in an “instruction” that gets sent to the KBP. You can also check content across packet boundaries, since most long messages end up being fractured into multiple packets. This avoids the chance that something untoward sneak in with its head in one packet and its tail in the next.

Companies can add their own rules – and, in this case, it’s not typically going to be the system builder that adds rules, but the service provider using the system. So the mechanism has to be particularly straightforward, because it’s several degrees removed from anyone familiar with the dirty details of DPI.

The idea is that these chips can accompany the packet-processing chips that manage the actual network traffic. DPI would normally be done by host processors in the “slow path,” since the dedicated packet or flow processors in the “fast path” can’t do it, given their laser-like focus on packet routing. But if practically every packet must be scanned, one could almost argue that DPI needs to be added to the fast path.

Typically, however, packets are sent out of the traditional fast path for checking, since there’s no host-style processor in the fast path. Offloading is intended to make that portion of the slow path faster. The dedicated engine can process the rules at rates ranging from 250 Mbps to 20 Gbps (depending on the device), much more quickly than a straight software implementation would be able to.

Given a set of rules for English grammar, that might even be even fast enough to process all of Chester’s incoming emails and memos. Except for one problem: the rules for the English language are anything but regular…


*Full disclosure: I violate this rule in my drafts all the time. Our editor apprised me of the rule, and I kept failing it so often that I started doing my own “only” scans before submitting…

Leave a Reply

featured blogs
Nov 30, 2023
No one wants to waste unnecessary time in the model creation phase when using a modeling software. Rather than expect users to spend time trawling for published data and tediously model equipment items one by one from scratch, modeling software tends to include pre-configured...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

TDK CLT32 power inductors for ADAS and AD power management

Sponsored by TDK

Review the top 3 FAQs (Frequently Asked Questions) regarding TDK’s CLT32 power inductors. Learn why these tiny power inductors address the most demanding reliability challenges of ADAS and AD power management.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines.

Read more

featured chalk talk

PIC32CX-BZ2 and WBZ451 Multi-Protocol Wireless MCU Family
Sponsored by Mouser Electronics and Microchip
In this episode of Chalk Talk, Amelia Dalton and Shishir Malav from Microchip explore the benefits of the PIC32CX-BZ2 and WBZ45 Multi-protocol Wireless MCU Family and how it can make IoT design easier than ever before. They investigate the components included in this multi-protocol wireless MCU family, the details of the software architecture included in this solution, and how you can utilize these MCUs in your next design.
May 4, 2023