It’s not often you see memory chips with heat sinks. But these are no ordinary memory chips.
No, indeed. These are “bandwidth engines,” a new class of chips from MoSys. Don’t know MoSys? Then you’re probably not an SoC designer. The company has been around for more than 20 years, but it made its name in the IP-licensing business, not the chip business. MoSys created the single-transistor (1T) SRAM cell, which you could license for inclusion in your own chip. 1T SRAMs are a lot smaller than traditional six-transistor SRAM cells, so the MoSys technology was good for packing a lot of memory into a little space.
The bloom has come off the IP-licensing rose, however, so, a few years ago, MoSys started making a corporate about-face, converting itself into a chip company. And Exhibit A in that transformation is the bandwidth engine.
A bandwidth engine (BE for short) is basically a smart SRAM chip. That is, it’s a big memory with some onboard intelligence to make the memory go faster. I know: the term “smart memory” seems like an oxymoron. Or at the very least, an engineering solution in search of a problem. It’s also somewhat tainted. The path to smart memory chips is littered with the bones of failed projects. You’re right to be cautious.
But MoSys seems to have this figured out. For starters, the new BE is designed to solve a specific set of problems often encountered by network line cards and other Interweb appliances. The basic purpose of network plumbing is to get data packets from Point A to Point B as expeditiously as possible, while also massaging that data as intelligently as possible. Thus, line cards tend to be populated with high-end packet processors and/or FPGAs connected to a whole lot of memory. They shuttle data packets in and out of that memory as quickly as they can before moving on to the next packet. There’s a lot of data traffic and a lot of transfer back and forth between the memory and the packet processors.
In that kind of environment, a smart memory chip can help, and this is where MoSys stepped in. Each BE chip is basically a big (576 Mbit) SRAM with a limited RISC processor at the front end. The processor is there to intercept read/write requests from the packet processor and to look for ways to simplify, speed up, combine, or even avoid memory accesses. Any memory access saved is time saved, so the processor pays for itself in terms of faster access speed and lower bus usage.
You might think there are scant opportunities for a mere memory chip to second-guess a big million-gate packet processor, and you’d be right. We’re used to our processors being smart and our memories being dumb. All the thinking goes on in the processor; the memory is just there to do what it’s told. The pitfall that other smart memory designers fell into was in making their memories too smart. You had to program around them, or at least program with an awareness of them. The smart memories became, in a sense, another program thread that was far from transparent to software.
MoSys doesn’t do that. Its bandwidth engine is smart, yes, but also obedient. You treat it just like any other memory, but one that’s frequently faster than it ought to be. A good example is a read-modify-write operation, which network line cards spend a lot of time doing. In a typical processor/memory pair, the processor would read a word out of memory into a register (one bus transaction), modify it is some way (add, subtract, exclusive-OR, etc.), and write the result back to the same address (a second bus transaction). The bandwidth engine recognizes this type of transaction and performs the entire read-modify-write operation internally, saving time and reducing bus activity. The processor doesn’t know the difference, but the memory is much less battered. Do enough of these RMW cycles back-to-back and you start saving real energy. The chip can perform most simple logic and arithmetic operations on its own.
The third component of the bandwidth engine is its bus interface. MoSys designed a special bus protocol specifically for bandwidth engines that has very little overhead compared to, say, Serial RapidIO. The interface is designed for short transfers of small packets: exactly the kind of traffic packet processors typically do. It’s not efficient for long table walks or other computer-esque transactions, but that’s not what it’s intended for. The low pin count and small packet overhead mean fewer, shorter PCB traces on the line card and a bit less EMI.
The downside? Bandwidth engines are not a drop-in replacement for normal memories. You’ve got to design them in from the outset. The new bus interface, of course, requires a memory controller (or a packet engine with its own memory controller) that speaks the new interface language. Right now there are exactly zero such chips, but Altera and Xilinx both support it in their high-end FPGAs. MoSys will happily license the interface IP to you for inclusion in your own chips, too.
Previous stabs at creating smart memory chips were usually too ambitious, and they generally failed the “transparency” test. Nobody wanted to design a computer system around its memory; that just seemed backward. But MoSys has taken on a well-bounded and tractable problem, yet one with a big enough market to be commercially successful. There are a lot of designers out there creating black boxes that speed the ’Net along, and at least of few of them are likely to give the bandwidth engine a shot.