feature article
Subscribe Now

The Persistence of Memory

Performance-IP’s MRO Speeds up Slow Memories

“If you optimize everything, you will always be unhappy.” — Donald Knuth

Q: When is a cache not a cache?
A: When it’s a Memory Request Optimizer.

If that sounds tautological (aren’t all caches memory-request optimizers?), then you haven’t talked to Performance-IP, a small startup in the Boston area. P-IP has a patent-pending way to speed up your system’s slow accesses to external memory by interposing some clever logic of its own.

The company’s MRO (memory request optimizer) sits between your system bus and your memory controller – like a cache. But it’s not a cache. It monitors requests for external memory reads and supplies data from its own internal storage. But it isn’t a cache. It’s smart about how, when, and where your system is accessing external memory, so it can cut latency by huge amounts, but without being a cache. Its benefits are measurable but also somewhat unpredictable. But it’s still not a cache.

The MRO logic doesn’t have traditional cache tags, so it’s not technically a cache. Instead, it has “trackers,” which serve a similar purpose but in a different manner. You can configure the number of trackers in your implementation of the MRO (it’s supplied as Verilog), so you can tune the number of trackers to balance performance against area and power. As a rule of thumb, you’ll want about 10–20 trackers, although some benchmarks show marked improvement with only four.

The MRO does store data locally, like a cache, and that’s one source of its performance-enhancement capabilities. Its local storage (P-IP calls them response buffers) is undoubtedly faster than your external RAM, so any read “hit” is a performance win.

But its trackers are also proactive, and they will prefetch data based on what they observe about your code’s locality of reference. If its internal statistic-gathering mechanism suggests that you’re accessing a certain range of addresses linearly, it’ll prefetch the upcoming data for you and store it in its response buffer. If all goes according to plan, you’ll be able to skip a couple of external memory reads entirely.

It’s this proactive prefetching that is the other source of MRO’s performance. Unlike a memory scheduler, the MRO doesn’t ever rearrange or reorganize memory accesses. Nothing ever gets delayed, or hoisted up to the front of the queue. Instead, it attempts to apply some rationality to your system’s scattered memory accesses, looking for locality where the compiler couldn’t find any. This is particularly fruitful in multicore and multi-threaded systems where each thread might be perfectly linear, but the combination of all threads/cores together makes for a haphazard melee for memory. MRO tries to stand above the fray, looking for overall patterns that can be exploited for gain.

Naturally, the slower your memory is, the better the MRO works. Or, more accurately, the greater the disparity between your processors’ performance and your memory’s performance, the greater the benefit. Not unlike a cache.

Once you’ve simulated, configured, and installed your MRO, you still have some run-time options available to you. It has three speeds: low, medium, and high (as well as “off”). The distinction is how aggressively the MRO will prefetch data that it thinks you might want. Set the mode too aggressively and you might generate more false fetches than you would see at a lower setting. It’s hard to predict which setting will work best with what software – which is why it’s programmable. Apart from these configuration settings, the MRO is entirely invisible to software. Sort of like a cache.

Performance-IP has lots of benchmark results on its website to show how MRO performs in various modes, with various test suites and various memory speeds. With things configured just right, they’ve seen 88% reductions in memory latency and 50% improvements in CPU performance.

The company doesn’t charge royalties for licensing MRO – just a single up-front licensing fee, with free support. It’s a pretty good deal, if you’ve got the cash.

Leave a Reply

featured blogs
Aug 13, 2020
General Omar Bradley famously said: '€œAmateurs talk strategy. Professionals talk logistics.'€ And Napoleon (perhaps) said "An army marches on its stomach". That's not to underestimate... [[ Click on the title to access the full blog on the Cadence Commun...
Aug 12, 2020
Samtec has been selling its products online since the early 2000s, the very early days of eCommerce. We’ve been through a couple of shopping cart iterations since then. Before this recent upgrade, Samtec.com had been running on a cart system that was built in 2011. It w...
Aug 11, 2020
Making a person appear to say or do something they did not actually say or do has the potential to take the war of disinformation to a whole new level....
Aug 7, 2020
[From the last episode: We looked at activation and what they'€™re for.] We'€™ve talked about the structure of machine-learning (ML) models and much of the hardware and math needed to do ML work. But there are some practical considerations that mean we may not directly us...

Featured Video

Product Update: New DesignWare USB4 IP Solution

Sponsored by Synopsys

Are you ready for USB4? Join Gervais Fong and Eric Huang to learn more about this new 40Gbps standard and Synopsys DesignWare IP that helps bring your USB4-enabled SoC to market faster.

Click here for more information about DesignWare USB4 IP

Featured Paper

Computational Software: 4 Ways It is Transforming System Design & Hardware Design

Sponsored by BestTech Views

Cadence President Anirudh Devgan shares his detailed insights on Computational Software. Anirudh provides a clear definition of computational software, and four specific ways computational software is transforming system design & hardware design -- including highly distributed compute, reduced memory footprints, co-optimization, and machine learning applications.

Click here for the white paper.

Featured Chalk Talk

Keeping Your Linux Device Secure

Sponsored by Mentor

Embedded security is an ongoing process, not a one-time effort. Even after your design is shipped, security vulnerabilities are certain to be discovered - even in things like the operating system. In this episode of Chalk Talk, Amelia Dalton chats with Kathy Tufto from Mentor - a Siemens business, about how to make a plan to keep your Linux-based embedded design secure, and how to respond quickly when new vulnerabilities are discovered.

More information about Mentor Embedded Linux®