feature article
Subscribe Now

Hardware IP Cuts Memory Size, Bandwidth, and Power

Startup ZeroPoint Compresses Memory on the Fly

Memory. What’cha gonna do with it, amirite? It’s too slow, too expensive, it takes up too much space on your board, and the supply is more volatile than a honey badger on acid. But every system needs memory, and the more processors you have, the more memory you need. Too bad there isn’t a way to, y’know, somehow make half of those problems go away. 

Wish granted. A Gothenburg Sweden–based startup called ZeroPoint Technologies (ZPT) thinks it has come up with a solution: an all-hardware IP block that nestles in between your chip’s processor(s) and memory controller(s), cutting memory usage by more than half, slashing bandwidth on the external bus, and trimming the power consumed by all those read/write cycles. 

What wizardry is this? What arcane secret have ZPT’s 15 employees uncovered that the rest of us somehow missed? The trick is all in compression. ZPT’s founders have been studying and implementing memory-compression algorithms for years, and they ultimately came up with Ziptilion, the company’s one and only product. It compresses data on the fly as it’s written to memory and decompresses the data on its way back in. Unlike some earlier approaches to memory compression, Ziptilion is (almost) completely invisible to software, including operating systems and drivers. It’s a hardware-only fix that squeezes more data into the same amount of RAM. 

The company claims a 2:1 average compression ratio, meaning you can get by with half the physical memory you thought you needed. Compression also means bus transactions are shorter and less frequent, reducing bus traffic. And that, in turn, saves power. It’s a three-way win.

The catch? ZPT’s data compression takes time. Ziptilion adds 7 cycles to every memory write transaction, and another 7 cycles to every read. That affects latency, not throughput, so your system has the same memory bandwidth as before; it just takes a bit longer to get started. This may or may not affect performance, depending on the speed of your existing memory subsystem and the locality of your data references. 

Memory compression isn’t a new idea – plenty of companies have either tried it or actually implemented it – but making it functionally transparent to the system is a new trick. IBM’s CodePack for PowerPC, for example, squeezed down the size of executable code (but not data) by compressing and decompressing instructions on the fly in hardware. This was different from other “compressed” instruction sets like ARM’s Thumb that simply used shorter and less flexible opcodes. IBM’s system really did reorganize and compress the bits of standard PowerPC binaries, like running WinZIP on program files. The downside was that compressed code wasn’t transportable, and it was inscrutable to debuggers. It also required cooperation from the host operating system to know whether code blocks were compressed or not. It worked, but it wasn’t pretty. 

In contrast, ZPT’s Ziptilion requires no software intervention. It sits beside the memory controller and silently compresses and decompresses everything – code and data – as it passes in or out of the chip. The rest of the system is blissfully unaware that anything has happened, so there are no operating system tweaks or funny drivers to integrate. The only observable difference is that the frequency of memory transactions has declined. 

Compression seems like a simple idea – obvious, even – but the devil is in the details. For starters, different data sets compress differently based on the compression algorithm used. Some data is not compressible at all. A JPEG image, for example, is already compressed; you can’t compress it any further. The size of the data also affects compressibility. Generally speaking, the larger the data sample, the more effectively you can compress it. 

Then there’s the weird effect on memory addresses. Compressing data on its way out to memory necessarily changes where that data is written. Your software might think it’s writing 4096 bytes to address 0x8000–0x9000, but the whole point of Ziptilion is to squeeze that down. It won’t be 4096 bytes when it’s done, and it won’t be located at address 0x8000. 

Ziptilion tackles the “best algorithm” problem by implementing several different ones, all proprietary to ZPT. (All compression is lossless, obviously.) It constantly monitors the data flowing in and out and chooses the most effective algorithm to produce the smallest result. It even changes algorithms on the fly, constantly iterating as data types change. Alternatively, you can force the choice of algorithm if you already know the characteristics of your data. 

This self-training feature is automatic and invisible to the user or the system. However, it takes time to collect enough data samples to decide to change algorithms. Switching time is on the order of seconds, not milliseconds, according to CEO Klas Moreau. Ziptilion isn’t designed to change with every task switch or process thread. 

Dynamically switching among different compression schemes adds yet another wrinkle. Any data written to memory obviously needs to be decompressed using the same algorithm that was used to compress it, even after the passage of arbitrary amounts of time, so Ziptilion tags memory with an algorithm ID.

If compressing data on writes uses less memory, then read cycles must return more data than absolutely necessary. When Ziptilion receives a processor request (or more likely, a cache line request) to read 64 bytes from external memory, it doesn’t yet know how much space that data actually occupies, so it fulfills the entire 64-byte request. In all likelihood, the data occupies only about 32 bytes of RAM, so the additional 32 bytes is superfluous. Rather than discard the excess data, however, Ziptilion keeps it in a local buffer on the assumption that it will be requested next. In effect, it’s a read-ahead buffer or a very simple last-level cache. If the surplus data isn’t needed, no harm done. But if it is, Ziptilion saves an entire memory transaction. 

ZPT offers an optional encryption module to go with Ziptilion, which many customers requested. Encryption happens after compression, partly because there’s less data to encrypt that way, and partly because encrypted data tends to be stochastic and therefore incompressible. Doing it the other way wouldn’t be very effective. 

Although Ziptilion’s operation is virtually transparent to both hardware and software, there is one scenario where it may complicate life, and that’s during debugging. The actual physical memory addresses that Ziptilion uses don’t correspond to what the processor(s) think they’re using. In effect, it adds a level of MMU address translation above and beyond whatever the processor might be doing, but that’s hidden from it. That might confuse third-party debuggers or systems that use shared memory. 

But even that’s not insurmountable. Ziptilion maintains its own MMU translation table in external memory, and when the device is in debug mode that table is not compressed or encrypted in any way, so debug tools can examine or interrogate it to see what goes where. 

Ziptilion’s size, speed, and cost depend on a lot of factors, of course, but CEO Moreau says that 1.5 million gates is a good baseline number to plan for. Encryption adds more to that, the size of the prefetch buffer is configurable, and multicore implementations are more complex. Licensing terms are typical for semiconductor IP, with an upfront fee plus royalties. 

Moreau points out that memory performance used to be an engineer’s biggest concern. Now it’s performance-per-watt, both at the low end (battery-powered mobile devices) and the high end (datacenter servers). Saving power by saving memory is an excellent tradeoff for these designs. The cost savings and optional encryption are bonuses. Unlike the stuff advertised on late-night TV, Ziptilion might be the memory supplement that actually works.

One thought on “Hardware IP Cuts Memory Size, Bandwidth, and Power”

Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Universal Verification Methodology Coverage for Bluespec RISC-V Cores

Sponsored by Synopsys

This whitepaper explains the basics of UVM functional coverage for RISC-V cores using the Google RISCV-DV open-source project, Synopsys verification solutions, and a RISC-V processor core from Bluespec.

Click to read more

featured chalk talk

Stepper Motor Basics & Toshiba Motor Control Solutions
Sponsored by Mouser Electronics and Toshiba
Stepper motors offer a variety of benefits that can add value to many different kinds of electronic designs. In this episode of Chalk Talk, Amelia Dalton and Doug Day from Toshiba examine the different types of stepper motors, the solutions to drive these motors, and how the active gain control and ADMD of Toshiba’s motor control solutions can make all the difference in your next design.
Sep 29, 2023
8,395 views