feature article
Subscribe Now

A Peek at the Next Standard?

Rambus presents concepts for beyond DDR3 at Memcon

If you ever want to strike up an argument at a party – ok, a very nerdy party and, admittedly, a very dull party if this were actually to add life to it – just start talking about standards. (Yeah, I know, an even better idea would be to leave and go to a fun party, but work with me here…) You see, you can generally divide the world into three camps: those that sit on standards committees, those that used to sit on standards committees and ultimately got frustrated and ran screaming out of the room, and the other 95% of the engineering population, who couldn’t care less about standards.

So, setting aside the latter apathetic majority, the other two camps will go tooth and nail over whether standards are useful or giant exercises in mental mas- … er… thumb-twiddling. And it could probably be demonstrated (by someone other than me) that there are more standards that have never taken hold than there are that have.

But there is one area that has been unceasingly active with standards that have been followed: memories. Everyone needs memories, and there are too many different kinds of players with vested interests. Some big memory guy can’t just say, “here’s what my memory interface looks like and I’m sticking to it.” OK, they can, but they have to have such an advantage that other companies will play along, and even then they may eventually try to find ways to sneak their patented technology into a standard.

But other than that, you’ve got processor guys and memory controller guys and module guys and FPGA guys and interface guys and all kinds of other guys in whatever guise that need to be able to talk to memories. And everyone loses if someone tries to game the system or if the standard is set too late.

And that’s particularly cool: the standards have to be set before the memories are created, so you get a very early view as to what’s coming. Which can make for interesting reading because memories are such different beasts from your run-of-the-mill logic chips. Last year we took a look here at the fly-by timing gymnastics required for DDR3 memories. So naturally it’s going to be of interest when there’s a session at Denali’s Memcon show about what’s beyond DDR3.

Rambus was the featured speaker on the topic, and they talked about a number of topics as being enablers for the next generation of DRAM. Which we’ll get to in a sec. But, full disclosure here, I end up trying to look half intelligent on enough wide-ranging topics that it’s entirely likely that after a presentation like that I’m going to have to go back and research a few things that a specialist might understand with no further explanation. The preponderance of ™ symbols plastered all over the slides was another reason for wanting to look a bit deeper.

And so I went to look up some of the topics and found that some of them were released in products as long ago as 2005. Which confounded me mightily. Why is four-year-old technology forming the basis of leading-edge memories? I expected to see interesting new things that haven’t been done before, perhaps early experimental data. I expected that we might see the result of the combined thinking of some of the better minds of the industry as they started the laborious process of forming the next standard. Yes, Rambus might be the presenter, but hey, only one person can present, and that person will be from some company that could be Rambus or someone else, so the material can still represent lots of joint collaborative thinking.

But, stepping back after looking through the presentation, it would appear that this might not have been the case. In fact, a cynic might even suggest that Rambus was using this as an opportunity to lobby for their proprietary (and presumably patented) technology in the next standard, even if that technology is years old. Not that I’m saying I’m a cynic… just saying…

In fairness, it could be that, even if years old, these things could, if added on to other ideas in play, make for faster, bigger – and, even more importantly, lower power – memories. So appearances of marketing assertiveness shouldn’t necessarily negate possible technical value. So let’s look at the gist of what was presented. And I’m going to try to avoid overpopulating the page with trademarks.

There were a few “small” items, like near-ground signaling, that carry on trends that have been in play for a while; we won’t focus on those here. The two biggest different items had to do with the memory access protocol and what they call “threading.”

This memory is all skewed up

Current memories have a complicated timing mechanism that requires a strobe signal to be delivered at a precise time for sampling the output of the memory. Historically this has required very careful layout with matched traces, which has become unmanageable, and which DDR3 somewhat mitigates. But Rambus proposes taking this one step further to eliminate layout dependence.

The approach they take is almost obvious, so presumably it took some clever circuitry to realize it. The problem they are solving is the skew between the various pins on the memory. You need to be able to interpret signals on a group of pins with different trace lengths – and hence different signal arrival times – as all having arrived at the “same time.” In fact, it’s harder than that: if you are clocking the circuit, the clock has to sample the signals all at the same time. Traditionally this is done with 8B/10B encoding of signals, which embeds a clock that can be recovered for each signal. This takes a lot of circuitry and power and wastes bandwidth due to the 20% coding overhead needed to guarantee the transition density necessary for keeping a PLL spinning on the receive side.

Instead, Rambus appears to have a deskew procedure that seems quite similar to the deskew procedures that any test or measurement device might have. The memory controller stores a skew value that is presumably measured at power-up. Each pin can have a different skew. So now, instead of having to have clocks embedded in all the signals, the memory controller adjusts its sampling time to account for the skew of each signal individually.

This deskew procedure can even be refreshed during operation in the event that temperature or other environmental variables have changed the timing. You can then keep the sampling aligned properly as the signals drift.

By using this approach, big chunks of circuitry can be removed from the memory, saving area and power (although some circuitry moves from the memory to the controller). You can tighten the timing for faster access on top of higher bandwidth. And frankly, the protocol and timing are rather dramatically simplified.

Threading your way through

The other angle presented related to the way in which transactions are gathered from the memories. Traditionally, a memory module is treated as a single memory for use by the processor. With a given thread of execution, there is often a locality of storage, allowing the memory controller to “stack” requests for bigger transactions, minimizing the amount of handshake required to get things going.

But when you have multiple threads executing, each of them may be pulling from a different area, so you end up with lots of small transactions as the access requests from different threads are interleaved.

Instead of treating the module as one wide memory, you can treat it as two (or some other multiple) deeper memories and assign threads to one side or the other. Now, on a given half of the memory, you have half the number of threads interweaving their requests, making it more likely that the controller can build longer requests. Of course, because you’re going deeper rather than wider, the latency goes up, but overall bandwidth can increase.

You can even take this one step further by organizing the actual memory in a way that two halves can be accessed separately and by then assigning threads to one side or the other – they call this “microthreading.” The concept is more or less the same, but, of course, the memory has to be built for this; it isn’t just a module/controller thing.

So these don’t sound like bad things. Of course, you and I aren’t privy to whatever conversations (or shouting matches) might have taken place in the hallowed standards-setting halls (or whispered or shouted in the halls outside those halls). We don’t as yet know whether this presentation represents a consensus that is forming or whether it represents an alternative to the consensus that is forming.

That is, of course, assuming there actually is a consensus forming. Were the insiders lurking in the shadows of the audience beaming in pride or bristling with indignation? Will the upcoming meetings be characterized by back-slaps for a presentation well done or chest-pokes for attempting an end-around? Short of actually attending the standards meetings (where the press are generally not welcome and pillows not provided), we’ll have to remain attuned to the results of the process and see whether the final result ends up looking anything like what was presented.

[Full disclosure: the author has, many years ago, attended, and even chaired, standards committees without running screaming out of the room and has thereby earned the right to mock the standards system while acknowledging wholeheartedly the value they can provide.]

Leave a Reply

featured blogs
Apr 17, 2024
The semiconductor industry thrives on innovation, and at the heart of this progress lies Electronic Design Automation (EDA). EDA tools allow engineers to design and evaluate chips, before manufacturing, a data-intensive process. It would not be wrong to say that data is the l...
Apr 16, 2024
Learn what IR Drop is, explore the chip design tools and techniques involved in power network analysis, and see how it accelerates the IC design flow.The post Leveraging Early Power Network Analysis to Accelerate Chip Design appeared first on Chip Design....
Mar 30, 2024
Join me on a brief stream-of-consciousness tour to see what it's like to live inside (what I laughingly call) my mind...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

TE Connectivity MULTIGIG RT Connectors
In this episode of Chalk Talk, Amelia Dalton and Ryan Hill from TE Connectivity explore the benefits of TE’s Multigig RT Connectors and how these connectors can help empower the next generation of military and aerospace designs. They examine the components included in these solutions and how the modular design of these connectors make them a great fit for your next military and aerospace design.
Mar 19, 2024
4,242 views