feature article
Subscribe Now

Memories With Heat Sinks

MoSys “Bandwidth Engine” Accelerates Da Net

It’s not often you see memory chips with heat sinks. But these are no ordinary memory chips.

No, indeed. These are “bandwidth engines,” a new class of chips from MoSys. Don’t know MoSys? Then you’re probably not an SoC designer. The company has been around for more than 20 years, but it made its name in the IP-licensing business, not the chip business. MoSys created the single-transistor (1T) SRAM cell, which you could license for inclusion in your own chip. 1T SRAMs are a lot smaller than traditional six-transistor SRAM cells, so the MoSys technology was good for packing a lot of memory into a little space.

The bloom has come off the IP-licensing rose, however, so, a few years ago, MoSys started making a corporate about-face, converting itself into a chip company. And Exhibit A in that transformation is the bandwidth engine.

A bandwidth engine (BE for short) is basically a smart SRAM chip. That is, it’s a big memory with some onboard intelligence to make the memory go faster. I know: the term “smart memory” seems like an oxymoron. Or at the very least, an engineering solution in search of a problem. It’s also somewhat tainted. The path to smart memory chips is littered with the bones of failed projects. You’re right to be cautious.

But MoSys seems to have this figured out. For starters, the new BE is designed to solve a specific set of problems often encountered by network line cards and other Interweb appliances. The basic purpose of network plumbing is to get data packets from Point A to Point B as expeditiously as possible, while also massaging that data as intelligently as possible. Thus, line cards tend to be populated with high-end packet processors and/or FPGAs connected to a whole lot of memory. They shuttle data packets in and out of that memory as quickly as they can before moving on to the next packet. There’s a lot of data traffic and a lot of transfer back and forth between the memory and the packet processors.

In that kind of environment, a smart memory chip can help, and this is where MoSys stepped in. Each BE chip is basically a big (576 Mbit) SRAM with a limited RISC processor at the front end. The processor is there to intercept read/write requests from the packet processor and to look for ways to simplify, speed up, combine, or even avoid memory accesses. Any memory access saved is time saved, so the processor pays for itself in terms of faster access speed and lower bus usage.

You might think there are scant opportunities for a mere memory chip to second-guess a big million-gate packet processor, and you’d be right. We’re used to our processors being smart and our memories being dumb. All the thinking goes on in the processor; the memory is just there to do what it’s told. The pitfall that other smart memory designers fell into was in making their memories too smart. You had to program around them, or at least program with an awareness of them. The smart memories became, in a sense, another program thread that was far from transparent to software.

MoSys doesn’t do that. Its bandwidth engine is smart, yes, but also obedient. You treat it just like any other memory, but one that’s frequently faster than it ought to be. A good example is a read-modify-write operation, which network line cards spend a lot of time doing. In a typical processor/memory pair, the processor would read a word out of memory into a register (one bus transaction), modify it is some way (add, subtract, exclusive-OR, etc.), and write the result back to the same address (a second bus transaction). The bandwidth engine recognizes this type of transaction and performs the entire read-modify-write operation internally, saving time and reducing bus activity. The processor doesn’t know the difference, but the memory is much less battered. Do enough of these RMW cycles back-to-back and you start saving real energy. The chip can perform most simple logic and arithmetic operations on its own.

The third component of the bandwidth engine is its bus interface. MoSys designed a special bus protocol specifically for bandwidth engines that has very little overhead compared to, say, Serial RapidIO. The interface is designed for short transfers of small packets: exactly the kind of traffic packet processors typically do. It’s not efficient for long table walks or other computer-esque transactions, but that’s not what it’s intended for. The low pin count and small packet overhead mean fewer, shorter PCB traces on the line card and a bit less EMI.

The downside? Bandwidth engines are not a drop-in replacement for normal memories. You’ve got to design them in from the outset. The new bus interface, of course, requires a memory controller (or a packet engine with its own memory controller) that speaks the new interface language. Right now there are exactly zero such chips, but Altera and Xilinx both support it in their high-end FPGAs. MoSys will happily license the interface IP to you for inclusion in your own chips, too.

Previous stabs at creating smart memory chips were usually too ambitious, and they generally failed the “transparency” test. Nobody wanted to design a computer system around its memory; that just seemed backward. But MoSys has taken on a well-bounded and tractable problem, yet one with a big enough market to be commercially successful. There are a lot of designers out there creating black boxes that speed the ’Net along, and at least of few of them are likely to give the bandwidth engine a shot. 

4 thoughts on “Memories With Heat Sinks”

  1. Very interesting. But… feeling a bit dense here… with your RMW example, that means that you not only access with the address of what you want, but you also pass the operation to it so that it can do it internally? Actually, I guess you would do a single request for both operands and include the operation? Which is very different from a standard memory access. How does that happen? Does the compiler have to know about it? Is there something magic in the memory controller?

  2. Right. To use the RMW example, the processor’s bus interface basically fires off a command that says, “add 5 to whatever you find at address 0x0123.” The BE takes it from there. The read from location 0x0123, the add, and the resulting writeback all happen inside the BE chip, not inside the processor. Obviously, this requires a very different kind of memory controller on both the processor chip and the memory chip. It’s not just the usual read/write/refresh transactions we’re used to.

    Currently, the BE can do addition, subtraction, and logical operations all by itself.

    It requires a lot more intelligence from the bus interface (on both sides) than “normal” processors or memories do. But high-end microprocessors (Core i7, Athlon, SPARC, et al) have had similar features for about ten years. A big Intel chip, for example, will queue up bus transactions internally before sending them out to the memory bus. If it sees two (or more) requests destined for nearby addresses it will combine them into one larger transaction, even though the CPU never asked to do that. Similarly, if the bus interface sees a write to a certain address, followed by some other unrelated read/write cycles to other addresses, followed by a read back from the first address, it will skip the second read and just return the data it was asked to write earlier. (This only works if you assume there is no second bus master sharing the memory that could change the data in the meantime.) They’ll also “hoist” read cycles in front of write cycles in the transaction queue, on the theory that the software stalls waiting for the results of a read, whereas writes are not performance-critical. Bus interfaces like this do all sorts of tricks to reorder, or even eliminate, memory transactions, and it’s all invisible to software.

  3. But what’s different here is not just the rearrangement of accesses that, to the memory controller, may be completely unrelated to each other. This requires two more things:
    – The notion that two operands aren’t just accesses that happen to be near each other (they may, in fact, not be near each other) but are part of the same operation – unless this only works by having one of the operands come from a register and get sent with the access
    – The inclusion of an operation.
    This basically mandates an entirely new bus protocol. Right now, you just have address and data for one access. Now you need address and data + operation + other operand (or other operand address). It’s almost like you need new machine code instructions to do that, since the CPU won’t have any way to communicate that directly to the memory. I still can’t figure out how this happens at a low level without some CPU and compiler magic. Perhaps the memory manager is itself memory mapped so you can do a “write” to it with the necessary data (other operand, operation) and then the store location. But the compiler would still need to create all of those writes instead of what would normally be a fetch, fetch, operate, store sequence.

    And I can’t even start to think about how cache would interact with this. Ouch!

  4. A “normal” memory interface has an address, a direction (read vs. write), and the data. The interface on the bandwidth engine is very different. It includes an address, an operation (such as add or exclusive-OR), and potentially an operand. That allows the BE chip itself to perform simple arithmetic and logical operations on its contents, which is much quicker than the traditional RMW method.

    The bus controller on the host side can identify these operations opportunistically without explicit software commands from the CPU. It’s true that new low-level CPU instructions and/or compiler tricks could make this work even better, but then it wouldn’t be invisible. The BE can’t exploit every opportunity to collapse memory ops, but it can do enough to be worthwhile. The only cost is the new host-side bus controller itself.

Leave a Reply

featured blogs
Sep 21, 2023
Wireless communication in workplace wearables protects and boosts the occupational safety and productivity of industrial workers and front-line teams....
Sep 26, 2023
5G coverage from space has the potential to make connectivity to the Internet truly ubiquitous for a broad range of use cases....
Sep 26, 2023
Explore the LPDDR5X specification and learn how to leverage speed and efficiency improvements over LPDDR5 for ADAS, smartphones, AI accelerators, and beyond.The post How LPDDR5X Delivers the Speed Your Designs Need appeared first on Chip Design....
Sep 26, 2023
The eighth edition of the Women in CFD series features Mary Alarcon Herrera , a product engineer for the Cadence Computational Fluid Dynamics (CFD) team. Mary's unwavering passion and dedication toward a career in CFD has been instrumental in her success and has led her ...
Sep 21, 2023
Not knowing all the stuff I don't know didn't come easy. I've had to read a lot of books to get where I am....

featured video

TDK PowerHap Piezo Actuators for Ideal Haptic Feedback

Sponsored by TDK

The PowerHap product line features high acceleration and large forces in a very compact design, coupled with a short response time. TDK’s piezo actuators also offers good sensing functionality by using the inverse piezo effect. Typical applications for the include automotive displays, smartphones and tablet.

Click here for more information about PowerHap Piezo Actuators

featured paper

Accelerating Monte Carlo Simulations for Faster Statistical Variation Analysis, Debugging, and Signoff of Circuit Functionality

Sponsored by Cadence Design Systems

Predicting the probability of failed ICs has become difficult with aggressive process scaling and large-volume manufacturing. Learn how key EDA simulator technologies and methodologies enable fast (minimum number of simulations) and accurate high-sigma analysis.

Click to read more

featured chalk talk

EV Charging: Understanding the Basics
Sponsored by Mouser Electronics and Bel
Have you ever considered what the widespread adoption of electric vehicles will look like? What infrastructure requirements will need to be met? In this episode of Chalk Talk, I chat about all of this and more with Bruce Rose from Bel. We review the basics of EV charging, investigate the charging requirements for both AC and DC chargers, and examine the role that on-board inverters play in electric vehicle charging.
Mar 27, 2023