feature article
Subscribe Now

A Peek at the Next Standard?

Rambus presents concepts for beyond DDR3 at Memcon

If you ever want to strike up an argument at a party – ok, a very nerdy party and, admittedly, a very dull party if this were actually to add life to it – just start talking about standards. (Yeah, I know, an even better idea would be to leave and go to a fun party, but work with me here…) You see, you can generally divide the world into three camps: those that sit on standards committees, those that used to sit on standards committees and ultimately got frustrated and ran screaming out of the room, and the other 95% of the engineering population, who couldn’t care less about standards.

So, setting aside the latter apathetic majority, the other two camps will go tooth and nail over whether standards are useful or giant exercises in mental mas- … er… thumb-twiddling. And it could probably be demonstrated (by someone other than me) that there are more standards that have never taken hold than there are that have.

But there is one area that has been unceasingly active with standards that have been followed: memories. Everyone needs memories, and there are too many different kinds of players with vested interests. Some big memory guy can’t just say, “here’s what my memory interface looks like and I’m sticking to it.” OK, they can, but they have to have such an advantage that other companies will play along, and even then they may eventually try to find ways to sneak their patented technology into a standard.

But other than that, you’ve got processor guys and memory controller guys and module guys and FPGA guys and interface guys and all kinds of other guys in whatever guise that need to be able to talk to memories. And everyone loses if someone tries to game the system or if the standard is set too late.

And that’s particularly cool: the standards have to be set before the memories are created, so you get a very early view as to what’s coming. Which can make for interesting reading because memories are such different beasts from your run-of-the-mill logic chips. Last year we took a look here at the fly-by timing gymnastics required for DDR3 memories. So naturally it’s going to be of interest when there’s a session at Denali’s Memcon show about what’s beyond DDR3.

Rambus was the featured speaker on the topic, and they talked about a number of topics as being enablers for the next generation of DRAM. Which we’ll get to in a sec. But, full disclosure here, I end up trying to look half intelligent on enough wide-ranging topics that it’s entirely likely that after a presentation like that I’m going to have to go back and research a few things that a specialist might understand with no further explanation. The preponderance of ™ symbols plastered all over the slides was another reason for wanting to look a bit deeper.

And so I went to look up some of the topics and found that some of them were released in products as long ago as 2005. Which confounded me mightily. Why is four-year-old technology forming the basis of leading-edge memories? I expected to see interesting new things that haven’t been done before, perhaps early experimental data. I expected that we might see the result of the combined thinking of some of the better minds of the industry as they started the laborious process of forming the next standard. Yes, Rambus might be the presenter, but hey, only one person can present, and that person will be from some company that could be Rambus or someone else, so the material can still represent lots of joint collaborative thinking.

But, stepping back after looking through the presentation, it would appear that this might not have been the case. In fact, a cynic might even suggest that Rambus was using this as an opportunity to lobby for their proprietary (and presumably patented) technology in the next standard, even if that technology is years old. Not that I’m saying I’m a cynic… just saying…

In fairness, it could be that, even if years old, these things could, if added on to other ideas in play, make for faster, bigger – and, even more importantly, lower power – memories. So appearances of marketing assertiveness shouldn’t necessarily negate possible technical value. So let’s look at the gist of what was presented. And I’m going to try to avoid overpopulating the page with trademarks.

There were a few “small” items, like near-ground signaling, that carry on trends that have been in play for a while; we won’t focus on those here. The two biggest different items had to do with the memory access protocol and what they call “threading.”

This memory is all skewed up

Current memories have a complicated timing mechanism that requires a strobe signal to be delivered at a precise time for sampling the output of the memory. Historically this has required very careful layout with matched traces, which has become unmanageable, and which DDR3 somewhat mitigates. But Rambus proposes taking this one step further to eliminate layout dependence.

The approach they take is almost obvious, so presumably it took some clever circuitry to realize it. The problem they are solving is the skew between the various pins on the memory. You need to be able to interpret signals on a group of pins with different trace lengths – and hence different signal arrival times – as all having arrived at the “same time.” In fact, it’s harder than that: if you are clocking the circuit, the clock has to sample the signals all at the same time. Traditionally this is done with 8B/10B encoding of signals, which embeds a clock that can be recovered for each signal. This takes a lot of circuitry and power and wastes bandwidth due to the 20% coding overhead needed to guarantee the transition density necessary for keeping a PLL spinning on the receive side.

Instead, Rambus appears to have a deskew procedure that seems quite similar to the deskew procedures that any test or measurement device might have. The memory controller stores a skew value that is presumably measured at power-up. Each pin can have a different skew. So now, instead of having to have clocks embedded in all the signals, the memory controller adjusts its sampling time to account for the skew of each signal individually.

This deskew procedure can even be refreshed during operation in the event that temperature or other environmental variables have changed the timing. You can then keep the sampling aligned properly as the signals drift.

By using this approach, big chunks of circuitry can be removed from the memory, saving area and power (although some circuitry moves from the memory to the controller). You can tighten the timing for faster access on top of higher bandwidth. And frankly, the protocol and timing are rather dramatically simplified.

Threading your way through

The other angle presented related to the way in which transactions are gathered from the memories. Traditionally, a memory module is treated as a single memory for use by the processor. With a given thread of execution, there is often a locality of storage, allowing the memory controller to “stack” requests for bigger transactions, minimizing the amount of handshake required to get things going.

But when you have multiple threads executing, each of them may be pulling from a different area, so you end up with lots of small transactions as the access requests from different threads are interleaved.

Instead of treating the module as one wide memory, you can treat it as two (or some other multiple) deeper memories and assign threads to one side or the other. Now, on a given half of the memory, you have half the number of threads interweaving their requests, making it more likely that the controller can build longer requests. Of course, because you’re going deeper rather than wider, the latency goes up, but overall bandwidth can increase.

You can even take this one step further by organizing the actual memory in a way that two halves can be accessed separately and by then assigning threads to one side or the other – they call this “microthreading.” The concept is more or less the same, but, of course, the memory has to be built for this; it isn’t just a module/controller thing.

So these don’t sound like bad things. Of course, you and I aren’t privy to whatever conversations (or shouting matches) might have taken place in the hallowed standards-setting halls (or whispered or shouted in the halls outside those halls). We don’t as yet know whether this presentation represents a consensus that is forming or whether it represents an alternative to the consensus that is forming.

That is, of course, assuming there actually is a consensus forming. Were the insiders lurking in the shadows of the audience beaming in pride or bristling with indignation? Will the upcoming meetings be characterized by back-slaps for a presentation well done or chest-pokes for attempting an end-around? Short of actually attending the standards meetings (where the press are generally not welcome and pillows not provided), we’ll have to remain attuned to the results of the process and see whether the final result ends up looking anything like what was presented.

[Full disclosure: the author has, many years ago, attended, and even chaired, standards committees without running screaming out of the room and has thereby earned the right to mock the standards system while acknowledging wholeheartedly the value they can provide.]

Leave a Reply

featured blogs
Jul 12, 2024
I'm having olfactory flashbacks to the strangely satisfying scents found in machine shops. I love the smell of hot oil in the morning....

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

Sponsored by Cadence Design Systems

Larsen & Toubro built the world’s largest FIFA stadium in Qatar, the world’s tallest statue, and one of the world’s most sophisticated cricket stadiums. Their latest business venture? Designing data centers. Since IT equipment in data centers generates a lot of heat, it’s important to have an efficient and effective cooling system. Learn why, Larsen & Toubro use Cadence Reality DC Design Software for simulation and analysis of the cooling system.

Click here for more information about Cadence Multiphysics System Analysis

featured paper

DNA of a Modern Mid-Range FPGA

Sponsored by Intel

While it is tempting to classify FPGAs simply based on logic capacity, modern FPGAs are alterable systems on chips with a wide variety of features and resources. In this blog we look closer at requirements of the mid-range segment of the FPGA industry.

Click here to read DNA of a Modern Mid-Range FPGA - Intel Community

featured chalk talk

Electromagnetic Compatibility (EMC) Gasket Design Considerations
Electromagnetic interference can cause a variety of costly issues and can be avoided with a robust EMI shielding solution. In this episode of Chalk Talk, Amelia Dalton chats with Sam Robinson from TE Connectivity about the role that EMC gaskets play in EMI shielding, how compression can affect EMI shielding, and how TE Connectivity can help you solve your EMI shielding needs in your next design.
Aug 30, 2023
37,416 views