feature article
Subscribe Now

Hardware IP Cuts Memory Size, Bandwidth, and Power

Startup ZeroPoint Compresses Memory on the Fly

Memory. What’cha gonna do with it, amirite? It’s too slow, too expensive, it takes up too much space on your board, and the supply is more volatile than a honey badger on acid. But every system needs memory, and the more processors you have, the more memory you need. Too bad there isn’t a way to, y’know, somehow make half of those problems go away. 

Wish granted. A Gothenburg Sweden–based startup called ZeroPoint Technologies (ZPT) thinks it has come up with a solution: an all-hardware IP block that nestles in between your chip’s processor(s) and memory controller(s), cutting memory usage by more than half, slashing bandwidth on the external bus, and trimming the power consumed by all those read/write cycles. 

What wizardry is this? What arcane secret have ZPT’s 15 employees uncovered that the rest of us somehow missed? The trick is all in compression. ZPT’s founders have been studying and implementing memory-compression algorithms for years, and they ultimately came up with Ziptilion, the company’s one and only product. It compresses data on the fly as it’s written to memory and decompresses the data on its way back in. Unlike some earlier approaches to memory compression, Ziptilion is (almost) completely invisible to software, including operating systems and drivers. It’s a hardware-only fix that squeezes more data into the same amount of RAM. 

The company claims a 2:1 average compression ratio, meaning you can get by with half the physical memory you thought you needed. Compression also means bus transactions are shorter and less frequent, reducing bus traffic. And that, in turn, saves power. It’s a three-way win.

The catch? ZPT’s data compression takes time. Ziptilion adds 7 cycles to every memory write transaction, and another 7 cycles to every read. That affects latency, not throughput, so your system has the same memory bandwidth as before; it just takes a bit longer to get started. This may or may not affect performance, depending on the speed of your existing memory subsystem and the locality of your data references. 

Memory compression isn’t a new idea – plenty of companies have either tried it or actually implemented it – but making it functionally transparent to the system is a new trick. IBM’s CodePack for PowerPC, for example, squeezed down the size of executable code (but not data) by compressing and decompressing instructions on the fly in hardware. This was different from other “compressed” instruction sets like ARM’s Thumb that simply used shorter and less flexible opcodes. IBM’s system really did reorganize and compress the bits of standard PowerPC binaries, like running WinZIP on program files. The downside was that compressed code wasn’t transportable, and it was inscrutable to debuggers. It also required cooperation from the host operating system to know whether code blocks were compressed or not. It worked, but it wasn’t pretty. 

In contrast, ZPT’s Ziptilion requires no software intervention. It sits beside the memory controller and silently compresses and decompresses everything – code and data – as it passes in or out of the chip. The rest of the system is blissfully unaware that anything has happened, so there are no operating system tweaks or funny drivers to integrate. The only observable difference is that the frequency of memory transactions has declined. 

Compression seems like a simple idea – obvious, even – but the devil is in the details. For starters, different data sets compress differently based on the compression algorithm used. Some data is not compressible at all. A JPEG image, for example, is already compressed; you can’t compress it any further. The size of the data also affects compressibility. Generally speaking, the larger the data sample, the more effectively you can compress it. 

Then there’s the weird effect on memory addresses. Compressing data on its way out to memory necessarily changes where that data is written. Your software might think it’s writing 4096 bytes to address 0x8000–0x9000, but the whole point of Ziptilion is to squeeze that down. It won’t be 4096 bytes when it’s done, and it won’t be located at address 0x8000. 

Ziptilion tackles the “best algorithm” problem by implementing several different ones, all proprietary to ZPT. (All compression is lossless, obviously.) It constantly monitors the data flowing in and out and chooses the most effective algorithm to produce the smallest result. It even changes algorithms on the fly, constantly iterating as data types change. Alternatively, you can force the choice of algorithm if you already know the characteristics of your data. 

This self-training feature is automatic and invisible to the user or the system. However, it takes time to collect enough data samples to decide to change algorithms. Switching time is on the order of seconds, not milliseconds, according to CEO Klas Moreau. Ziptilion isn’t designed to change with every task switch or process thread. 

Dynamically switching among different compression schemes adds yet another wrinkle. Any data written to memory obviously needs to be decompressed using the same algorithm that was used to compress it, even after the passage of arbitrary amounts of time, so Ziptilion tags memory with an algorithm ID.

If compressing data on writes uses less memory, then read cycles must return more data than absolutely necessary. When Ziptilion receives a processor request (or more likely, a cache line request) to read 64 bytes from external memory, it doesn’t yet know how much space that data actually occupies, so it fulfills the entire 64-byte request. In all likelihood, the data occupies only about 32 bytes of RAM, so the additional 32 bytes is superfluous. Rather than discard the excess data, however, Ziptilion keeps it in a local buffer on the assumption that it will be requested next. In effect, it’s a read-ahead buffer or a very simple last-level cache. If the surplus data isn’t needed, no harm done. But if it is, Ziptilion saves an entire memory transaction. 

ZPT offers an optional encryption module to go with Ziptilion, which many customers requested. Encryption happens after compression, partly because there’s less data to encrypt that way, and partly because encrypted data tends to be stochastic and therefore incompressible. Doing it the other way wouldn’t be very effective. 

Although Ziptilion’s operation is virtually transparent to both hardware and software, there is one scenario where it may complicate life, and that’s during debugging. The actual physical memory addresses that Ziptilion uses don’t correspond to what the processor(s) think they’re using. In effect, it adds a level of MMU address translation above and beyond whatever the processor might be doing, but that’s hidden from it. That might confuse third-party debuggers or systems that use shared memory. 

But even that’s not insurmountable. Ziptilion maintains its own MMU translation table in external memory, and when the device is in debug mode that table is not compressed or encrypted in any way, so debug tools can examine or interrogate it to see what goes where. 

Ziptilion’s size, speed, and cost depend on a lot of factors, of course, but CEO Moreau says that 1.5 million gates is a good baseline number to plan for. Encryption adds more to that, the size of the prefetch buffer is configurable, and multicore implementations are more complex. Licensing terms are typical for semiconductor IP, with an upfront fee plus royalties. 

Moreau points out that memory performance used to be an engineer’s biggest concern. Now it’s performance-per-watt, both at the low end (battery-powered mobile devices) and the high end (datacenter servers). Saving power by saving memory is an excellent tradeoff for these designs. The cost savings and optional encryption are bonuses. Unlike the stuff advertised on late-night TV, Ziptilion might be the memory supplement that actually works.

One thought on “Hardware IP Cuts Memory Size, Bandwidth, and Power”

Leave a Reply

featured blogs
Jul 5, 2022
The 30th edition of SMM , the leading international maritime trade fair, is coming soon. The world of shipbuilders, naval architects, offshore experts and maritime suppliers will be gathering in... ...
Jul 5, 2022
By Editorial Team The post Q&A with Luca Amaru, Logic Synthesis Guru and DAC Under-40 Innovators Honoree appeared first on From Silicon To Software....
Jun 28, 2022
Watching this video caused me to wander off into the weeds looking at a weird and wonderful collection of wheeled implementations....

featured video

Demo: Achronix Speedster7t 2D NoC vs. Traditional FPGA Routing

Sponsored by Achronix

This demonstration compares an FPGA design utilizing Achronix Speedster7t 2D Network on Chip (NoC) for routing signals with the FPGA device, versus using traditional FPGA routing. The 2D NoC provides a 40% reduction in logic resources required with 40% less compile time needed versus using traditional FPGA routing. Speedster7t FPGAs are optimized for high-bandwidth workloads and eliminate the performance bottlenecks associated with traditional FPGAs.

Subscribe to Achronix's YouTube channel for the latest videos on how to accelerate your data using FPGAs and eFPGA IP

featured paper

3 key considerations for your next-generation HMI design

Sponsored by Texas Instruments

Human-Machine Interface (HMI) designs are evolving. Learn about three key design considerations for next-generation HMI and find out how low-cost edge AI, power-efficient processing and advanced display capabilities are paving the way for new human-machine interfaces that are smart, easily deployable, and interactive.

Click to read more

featured chalk talk

Powering Servers and AI with Ultra-Efficient IPOL Voltage Regulators

Sponsored by Infineon

For today’s networking, telecom, server, and enterprise storage applications, power efficiency and power density are crucial components to the success of their power management. In this episode of Chalk Talk, Amelia Dalton and Dr. Davood Yazdani from Infineon chat about the details of Infineon’s ultra-efficient integrated point of load voltage regulators. Davood and Amelia take a closer look at the operation of these integrated point of load voltage regulators and why using the Infineon OptiMOS 5 FETs combined with the Infineon Fast Constant On Time controller engine make them a great solution for your next design.

Click here for more information about Integrated POL Voltage Regulators