feature article
Subscribe Now

A Better Flytrap

DDR3 Controllers Hit the Market for SoCs and FPGAs

That DDR memories work at all seems like a miracle. I mean, it’s like someone woke up one morning and said, “Hmmm…. You know, high-speed serial interconnect has complicated timing when you try to align a bunch of lanes… there HAS to be a way to take those concepts and make it even trickier to design.”

Here you’re taking a bank of memories and sending them data and address and clock and command and DQS signals, and all in “eye-diagram” territory. Skews and signal integrity are critical; this is not for the faint of heart. And, in fact, DDR architects have finally said “Uncle” for some of it.

Let’s start by reviewing the basic topology of a DDR memory. You’ve got a bunch of memory chips, each with data, address, control, and DQS. The DQS signal is the least familiar for those who’ve been fortunate enough not to have to think about how memories are accessed. Unlike older memories where the clock is used to time reads and writes, DQS is what really clocks data in during a write, or indicates data valid during a read; the clock signal is used as a reference timer for generating DQS. Each memory has its own DQS, which allows the timing of each memory to be tuned for its particular signals. Data and DQS lines are connected point-to-point; address and control lines are bussed to all the memories. Because the timing is so tight, the relative skews for all of these signals are critical.

The way the busses are created is through what is called T-branching – each time a bus branches, both sides of the branch have to be matched. And these busses branch more than once, so you end up with a complicated design job that more or less ends up looking something like clock tree synthesis on an IC. Each signal has a single source, and it has to arrive at multiple destinations at the same time. And they have to be impedance-matched, and each angle or via on the PCB acts as a discontinuity and, well, you can imagine it might get messy.

Actually, in theory, it’s a tad more complicated than that, since you also have to align, for each memory, with the data and DQS for that memory. If each of the data/DQS lines can be aligned using a technology like dynamic phase alignment, then all the data and DQS lines can be balanced and “equalized,” and the bussed signals can then be designed to arrive simultaneously at each chip.

If each of the data/DQS sets had a different arrival time, the job would get even nastier. Then again, the data isn’t a single signal – it’s a set of signals or lanes, and they have to be aligned with each other so that a single data grab on a given memory pulls the right data from each of the lanes, so, presumably, if you can align them with each other, then you can align them with the data/DQS signals from other memories as well. Having said that, aligning eight data signals and a DQS may be a big enough pain in the butt without having to align a bunch of sets of those.

Bottom line, it’s a tough timing job to get this all to work. Oh — and then it needs to continue working as the weather and temperature and phases of the moon change as well.

So… DDR2 doubled clock rates from DDR; hey, that was so much fun, let’s do it again! And let’s call it… hmmmmm… how… about… Oo! Oo! I’ve got it! How about DDR3? And now… we’ll pump data as fast as 1.6 Gb/s. Nice.

And this is where we abandon ship on the old bussing techniques. The kinds of timing tolerances are so crazy now that trying to use T-branching to balance everything out has been ruled out as being unworkable. And a new approach has been applied, taking a page from Fully-Buffered DIMMs. It’s called the “fly-by” technique. And it sounds simpler, and I’ll take designers’ word for it that it is simpler, but man, it still sounds like a tough thing to get right.

Instead of bussing the control signals to each memory, there’s one set of signals that simply starts at one side of the DIMM and crosses all the memories. Obviously, signals will arrive at the first memory before they get to the last memory. The idea here is to position signals on the control bus so that they arrive at their intended memories at the correct time. So the signal intended for, say, memory chip 3, will visit all the memories before and after hitting chip 3, but will arrive at chip 3 with timing such that is recognized by chip 3 and not by the other chips.

To be clear, we’re not talking about a pipeline here. We’re only talking wires, and signals are flying by, and we’re capturing them as they pass much as a frog catches a fly. The memory controller has to issue both the fly and the frog’s tongue and ensure that they meet. And it’s worse than that, it’s like the fly is being launched from California, the frog is in Japan, and the frog has a really long tongue that’s going to snag the fly 30,000 feet over oblivious beachgoers on Waikiki. No, it’s worse than that… you’ve got a bunch of flies and a bunch of frogs, and you’re going to send a stream of flies out and position them all so that the frogs all zap the right flies, taking into account variations in wind and air density and aerodynamics as the flies freak out. It just sounds unlikely, but it’s the new game. Scares the crap outa me. I guess it’s just that the old system was even scarier. Imagine that…

Part of the new game involves “read leveling” and “write leveling.” There’s a training sequence that executes before the controller goes into full operation to figure out the alignment of signals with respect to the various memories and their data/DQS lines. This basically teaches the controller how to set the timing of the signals it places on the control bus so that they reach their targets at the right time. Once set, the controller then has to compensate for variations in temperature and voltage to ensure that the memory continues working as conditions change.

Both Denali and Virage have announced DDR3 controller IP for SoCs. The solutions are modular, and the issues of higher-level controller tend to be separated from the thorny timing issues associated with the physical (PHY) level and the I/Os. Altera and Xilinx have also put their latest high-end devices to use as DDR3 controllers.

As for me, I think I’ll test out this new technology by heading out to Waikiki to gaze up into the night sky and see if I can spot any flies getting zapped. I could use the vacation.

Links for more information on any companies or products mentioned:
Altera DDR3
Denali DDR3
Virage DDR3
Xilinx DDR3

Leave a Reply

featured blogs
Aug 19, 2018
Consumer demand for advanced driver assistance and infotainment features are on the rise, opening up a new market for advanced Automotive systems. Automotive Ethernet allows to support more complex computing needs with the use of an Ethernet-based network for connections betw...
Aug 18, 2018
Once upon a time, the Santa Clara Valley was called the Valley of Heart'€™s Delight; the main industry was growing prunes; and there were orchards filled with apricot and cherry trees all over the place. Then in 1955, a future Nobel Prize winner named William Shockley moved...
Aug 17, 2018
Samtec’s growing portfolio of high-performance Silicon-to-Silicon'„¢ Applications Solutions answer the design challenges of routing 56 Gbps signals through a system. However, finding the ideal solution in a single-click probably is an obstacle. Samtec last updated the...
Aug 16, 2018
All of the little details were squared up when the check-plots came out for "final" review. Those same preliminary files were shared with the fab and assembly units and, of course, the vendors have c...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...