feature article
Subscribe Now

What Is a Cache, Anyway?

Looking at the Mysteries That Make Computers Go Faster

“If [a small business’s] cashflow dries up, the impact is fatal.” – Theo Paphitis, entrepreneur

What if FedEx replaced its delivery vans with a fleet of Ferraris? That’d be cool. They could deliver packages way faster than now. On the other hand, they’d be able to deliver only small packages; shipping a big bookcase would be a problem. Sportscars don’t have big trunks because speed, cargo capacity, mass, and a high center of gravity just don’t go together. 

It’s the same with computer memory chips. You can have fast ones, or you can have big ones, but you can’t have big, fast ones. It’s just the nature of chip-making science that speed and capacity are at opposite ends of a sliding scale. And this causes problems for computer designers. 

All computers, from Amazon’s massive cloud-computing datacenters down to the smartphone in your pocket or the thermostat on your wall, have both processors and memories. The processor chip(s) runs the software and the memory chip(s) stores the software and all its data. The processor reads software instructions and data out of the memory, and it stores the results back into the memory. This back-and-forth conversation happens nonstop, millions, even billions, of times per second.

Trouble is, the processor is faster than the memory – by a lot. That slows things down. It’s like a hyperactive poodle* straining at the leash held by its 300-pound owner. It wants to race, but the owner isn’t built for speed. So, what do you do? 

You hire a dog walker who can keep up with the dog. Sure, it adds a third person to the party, but everybody wins. The dog gets to run at full speed, the dog walker gets some aerobic exercise and makes a few extra bucks, while the owner gets to plod along at a comfortable and familiar pace, trading money for convenience.  

In the computer world, we fix this problem by adding a third chip to the mix, in between the processor and the memory. It’s called a cache memory and it’s about the only thing that FedEx trucks and dog walkers have in common. 

Memory chips can be either big or fast, but not both. Since we all want lots of storage in our phones, computers, cameras, and other gizmos, gadget makers almost always opt for the big, slow kind. That boosts the number of photos you can store, but it also means that new cutting-edge processor chip you paid extra for is running dog-slow because the memory can’t keep up with it. 

Enter cache memory. It’s a small amount of the fast type of memory, and it alleviates some (but not all) of the problem of speed mismatch between the processor and the big main memory. (Oh, and the word is pronounced “cash,” not “cash-ay,” which is a completely different French word meaning charm, elegance, or prestige.)

Like Robinson Crusoe, your processor stores a few of its favorite and most often used items in a cache – that is, a small local stash away from the main storehouse. The cache is nearby – it’s physically closer to the processor silicon than the memory is – and it’s made of the fast kind of memory, so it’s just as quick as the processor. 

There are only two downsides. First, cache memory is expensive, so adding a cache adds cost to the product. Second, caches are small. They can’t possibly hold as much stuff (software, photos, music, documents, passwords, etc.) as the main memory can. So, you’ve got to make some tradeoffs. 

You and I haven’t got a clue what should go into the cache and what should go into the main memory, so nobody asks us. Instead, the cache makes that decision all on its own. And this is where the magic happens. 

Everything going in and out of the processor passes through the cache (as the diagram shows), so it gets to snoop on every transaction. When the processor requests new data from the memory (from right to left in the diagram), it passes through the cache, which keeps a copy. If the processor happens to request that same data again later on, the cache springs into action and supplies the processor with its own local copy, not the one stored all the way back in the main memory. This speeds things up considerably. The cache is like an attentive butler who always knows what you want before you want it. 

It also works in reverse. When the processor stores something into memory (data moving left to right in the diagram), the cache snags a copy of the data as it passes through on its way to main memory. The benefit here is that the cache stores the data much sooner than the main memory could, so it can tell the processor to go back about its business a bit sooner. Later, the data will make its way from the cache into the main memory. 

So… a win-win, right? Caches make everything better? Well, yes and no. There are some real tricks to make a cache work right, but on the whole, yes, caches are good things and nearly every modern gizmo has one or more cache memories inside.

What are the tricky parts? Caches work on the basis of probability, meaning that they play the odds and take small risks with our data in exchange for faster speed. Because the cache is small and can’t possibly hold as much as the main memory, it has to guess which bits to store and which ones to ignore. The methods for this are abstruse and arcane, and sometimes carefully guarded company secrets, but they all basically rely on something called a least-recently-used (LRU) algorithm. 

The LRU algorithm basically says that the cache should always keep a copy of any new data that passes through it, until it gets full. Then, it should find the oldest – the least-recently used – piece of data, and erase that to make room for the new data. Sounds reasonable. 

The problem is that the cache can’t simply erase data. If it did, we’d lose our photos, documents, and sanity all the time. The cache needs to store that data in the main memory so that it doesn’t get lost, and as we know, that takes time. But it’s the cache’s responsibility to make sure every bit of data finds its way safely into the main memory – no exceptions. 

The small size of the cache means that there’s lots of competition for space. This can lead to cache “thrashing,” as old data continually gets evicted to make way for new data. Making the cache bigger can reduce the thrashing, but it also increases cost and reduces speed. 

Some processors have more than one cache: one just for software and a separate cache for data. This reduces thrashing even more but adds still more to the cost. Note that instructions always go from memory to the processor; processors don’t write software, they read it. Data can go both ways.

If you want to get really fancy, you can have multiple levels of cache memory: caches on top of caches. The cache closest to the processor is called the level-one cache (L1), with the level-two (L2) cache behind it. The L1 cache is extremely fast (and extremely small), while the L2 cache is a bit larger and a bit slower, but still faster than the main memory. Your high-end Intel Xeon or AMD Epyc processors even have three cache levels: L1, L2, and L3.

The idea behind these multilevel caches is, as always, to keep up with the high-speed processor. A single cache would be either too small to be useful or too big to keep up, so the L1 cache gets its own cache (the L2), which might get its own cache (L3) for good measure. 

Which brings us to a new problem: cache coherence. All caches store data (and software), but it’s important that they all store the same data. Whenever the processor writes something new into the cache, that update will eventually make its way back into the main memory. Until it does, the data in main memory is old and out of date. It’s stale data, and we have to make sure stale data is never used again. There’s a time delay between when the new data goes into the cache and when it’s propagated out to main memory. That time delay presents a risk. 

With a single cache, you’ve got two places the data could be stored: in the cache or in main memory. With a two-level cache there are three possibilities, and with a gonzo three-level cache you’ve got four possible locations for any piece of data. Somehow the computer has to keep track of which copy is current and which one(s) are stale. This involves a lot of back-channel communication between the caches to keep everything coherent, meaning they’re all in sync with each other. Remarkably, this all works automatically without you, me, or the processor knowing anything about it. 

As complex as this all is, we haven’t even started on the exquisite complexities of cache-allocation algorithms, line eviction, dirty bits, multiprocessor systems, cache snooping, or… well, let’s save those gory details for another day. 

*Redundant, I know.

 

One thought on “What Is a Cache, Anyway?”

Leave a Reply

featured blogs
Jul 6, 2020
If you were in the possession of one of these bodacious beauties, what sorts of games and effects would you create using the little scamp?...
Jul 3, 2020
[From the last episode: We looked at CNNs for vision as well as other neural networks for other applications.] We'€™re going to take a quick detour into math today. For those of you that have done advanced math, this may be a review, or it might even seem to be talking down...
Jul 2, 2020
In June, we continued to upgrade several key pieces of content across the website, including more interactive product explorers on several pages and a homepage refresh. We also made a significant update to our product pages which allows logged-in users to see customer-specifi...

Featured Video

Product Update: New DesignWare® IOs

Sponsored by Synopsys

Join Faisal Goriawalla for an update on Synopsys’ DesignWare GPIO and Specialty IO IP, including LVDS, I2C and I3C. The IO portfolio is silicon-proven across a range of foundries and process nodes, and is ready for your next SoC design.

Click here for more information about DesignWare Embedded Memories, Logic Libraries and Test Videos

Featured Paper

Cryptography: Fundamentals on the Modern Approach

Sponsored by Maxim Integrated

Learn about the fundamental concepts behind modern cryptography, including how symmetric and asymmetric keys work to achieve confidentiality, identification and authentication, integrity, and non-repudiation.

Click here to download the whitepaper

Featured Chalk Talk

Power Supply Design

Sponsored by Mouser Electronics and KEMET

There is a bewildering range of choices for components for power supply design. Considering EMI protection, surge protection, transformers, rectifiers - the list goes on and on. In this episode of Chalk Talk, Amelia Dalton chats with Nick Stephen of KEMET to sort out the puzzle of power supply component selection, and to look at the latest trends and best practices in power supply design.

Click here for more information about KEMET Electronics METCOM MPX1 Metal Composite Power Inductors