feature article
Subscribe Now

The Match Game

Netlogic Speeds DPI by Accelerating Text Pattern Matching

Chester was a real stickler for grammar. It started innocently enough: he would review his own memos a couple of extra times to make sure they were right. Then he started cracking down on his staff: he wanted them all to be as careful about their prose as he was about his. And he was reviewing their stuff. And that worked, more or less.

But the problem was, he was kind of OCD about everything he read. A memo might come in describing an incredible new bonus program that was going to net him thousands of dollars, but misplaced commas would distract him, and he would completely miss the main message. He seemed to be able to comprehend only grammatically pure materials. He just couldn’t let it go.

So one day he decided he’d had enough. It was one thing for him to clean up the work of his staff, but now he decided he was done reading other people’s shoddy stuff. He directed that all memos and all emails be reviewed by his admin before being sent to him. He compiled, off the top of his head, a list of rules. And he would accompany some of them with a rant to illustrate why the rule mattered.

A typical example might be, “The word ‘only’ should be placed only in front of the thing that it modifies. I just read an email where it says, ‘It will only take a minute.’ That is incorrect usage of the word ‘only.’ If I read that right, it’s saying it will only take – not give or borrow or donate or fricassee – a minute. That makes no sense! What it should say is, ‘It will take only a minute.’ Not an hour, not a picosecond; a minute. THIS STUFF MATTERS, PEOPLE!!”*

Of course, there were only so many rules that he could come up with at a time. So, as emails got to him with problems not covered in the rules, he would make a note and add them to the rule set. He figured that, at some point, it would be water-tight, and he’d never again have to stumble over anything that broke his rules. If an email or memo came in and failed the test, his admin sent it back for correction: it would never get to his desk until clean.

This had an immediate effect on his workload: because he was no longer burdened with reviewing documents, he had far more time on his hands. In fact, things got easier and easier, and he was feeling rather pleased with his newfound liberation. Until he noticed his admin’s desk and computer desktop: piles of memos and hundreds of emails were stacked up awaiting review. It wasn’t that he no longer had as much work to do; it was that everything was stuck in grammar review, and his admin couldn’t keep up.

It was only when he missed a mandatory corporate strategic offsite meeting (the invitation had asked attendees “… to please be prompt”, and Chester had decided that, grammarians’ disagreements on the point notwithstanding, split infinitives were an evil up with which he would not put) that he decided that he needed to accelerate the grammar rule-checking process.


Deep packet inspection (DPI) is the unglamorous process of peering into packets public and private to make sure that there’s nothing problematic lurking in there. “Problematic” typically refers to evil things like viruses and malware and Trojan horses (although there’s nothing to say that it couldn’t be extended to include pejorative comments about a government or company).

We recently looked at one aspect of the process of DPI, Netronome’s notion of flow processing. However, we really looked at acceleration of DPI by managing the rest of the process – flow processing, in this case. But we didn’t deal with the actual deep inspection of packets.

We also took a brief look at Snort, an open-source rule-processing engine. But that’s only one particular engine, and its complexity is limited by constraining the kinds of rules that can be expressed. One can formulate more complex search patterns than Snort can handle, but then the pattern-matching engine must also be more sophisticated.

You may recall that rules tend to consist of two parts: a pattern to match and then an action to take based on a match. You search for text having a particular characteristic and, if you find it, then you do something – and that something will depend on what’s being searched.

You take the action only if a match happens, which, hopefully, isn’t too often. But there are thousands and tens of thousands of possible things to look for to decide if a packet is good. Having to run all those rules takes time, but the performance issue isn’t with the action – a host processor can probably handle that; the problem is with all the string matching patterns from all the rules.

String matching is probably one of the exercises you used for your first software state machines in your undergrad programming course. There’s actually an intimidating name for the kind of state machine that parses “regular languages” or “regular expressions” – including, in particular, the apparently somewhat misnamed “Perl-compatible regular expressions (PCREs)”: a deterministic finite automaton (DFA). Every home should have one.

There’s a whole body of mathematics behind this regular expression thing that I won’t even attempt to plumb here. (Because I’d have to understand it first.) Put simply, they are a way of expressing strings in a search – they’re the “re” in “grep,” one of Unix’s typically opaque commands, this one meaning, more or less, “find.” (Why use a simple common word when a made-up one will do?) The bottom line of this is that you can compile a set of string search patterns into a tree that can be processed by a DFA.

Netlogic has taken this approach one step further with their NETL7 family of what they call “knowledge-based processors (KBPs).” (They also have a Sahasra family of KBPs, but they’re very different.) They’ve integrated their own enhanced DFA, which they call their Intelligent Fabric for Automata (IFA), into a dedicated chip. Actually, they’ve integrated around 10 per chip (their Mike Ichiriu, VP of Systems and Applications Engineering, kept the exact number somewhat vague).

The “fabric for automata” nomer makes sense: it’s not like there’s one set of hard and fast rules that can be cast into hardware via a dedicated state machine. The rules are forever changing, so any attempt to deal with this must allow for any state machines – or automata – within the defined scope to be implemented in the KBP fabric.

The KBP consists of logic and memory, including some packet buffering memory. The engine itself is very tightly coupled with internal memory for the stored patterns. The KBPs store only the pattern-matching part of the rule, not the action portion.

You can stream a packet through the engine either by passing a pointer to the packet or by actually encapsulating the packet in an “instruction” that gets sent to the KBP. You can also check content across packet boundaries, since most long messages end up being fractured into multiple packets. This avoids the chance that something untoward sneak in with its head in one packet and its tail in the next.

Companies can add their own rules – and, in this case, it’s not typically going to be the system builder that adds rules, but the service provider using the system. So the mechanism has to be particularly straightforward, because it’s several degrees removed from anyone familiar with the dirty details of DPI.

The idea is that these chips can accompany the packet-processing chips that manage the actual network traffic. DPI would normally be done by host processors in the “slow path,” since the dedicated packet or flow processors in the “fast path” can’t do it, given their laser-like focus on packet routing. But if practically every packet must be scanned, one could almost argue that DPI needs to be added to the fast path.

Typically, however, packets are sent out of the traditional fast path for checking, since there’s no host-style processor in the fast path. Offloading is intended to make that portion of the slow path faster. The dedicated engine can process the rules at rates ranging from 250 Mbps to 20 Gbps (depending on the device), much more quickly than a straight software implementation would be able to.

Given a set of rules for English grammar, that might even be even fast enough to process all of Chester’s incoming emails and memos. Except for one problem: the rules for the English language are anything but regular…


*Full disclosure: I violate this rule in my drafts all the time. Our editor apprised me of the rule, and I kept failing it so often that I started doing my own “only” scans before submitting…

Leave a Reply

featured blogs
Oct 22, 2020
WARNING: If you read this blog and visit the featured site, Max'€™s Cool Beans will accept no responsibility for the countless hours you may fritter away....
Oct 22, 2020
Cadence ® Spectre ® AMS Designer is a high-performance mixed-signal simulation system. The ability to use multiple engines and drive from a variety of platforms enables you to "rev... [[ Click on the title to access the full blog on the Cadence Community site....
Oct 20, 2020
In 2020, mobile traffic has skyrocketed everywhere as our planet battles a pandemic. Samtec.com saw nearly double the mobile traffic in the first two quarters than it normally sees. While these levels have dropped off from their peaks in the spring, they have not returned to ...
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

Electronic Fuses (eFuses)

Sponsored by Mouser Electronics and ON Semiconductor

Today’s advanced designs demand advanced circuit protection. The days of replacing old-school fuses are long gone, and we need solutions that provide more robust protection and improved failure modes. In this episode of Chalk Talk, Amelia Dalton chats with Pramit Nandy of ON Semiconductor about the latest advances in electronic fuses, and how they can protect against overcurrent, thermal, and overvoltage.

More information about ON Semiconductor Electronic Fuses