AMD Gets All Tokyo with Fiji

The semiconductor business is a lot like selling real estate. It’s not the dirt you’re paying for; it’s the location. A square acre in the middle of Manhattan will cost you a lot more than an acre in the desert (provided it’s not in the middle of a Saudi oil field). Likewise, a square millimeter of 28-nanometer silicon can cost a lot or a little, depending on who made it and what they did with it.

To stretch the analogy a bit further, the cost of the real estate also depends on what “improvements” you’ve made to the property. An empty field isn’t worth as much as a developed lot with a four-story apartment building on it (again, assuming your field isn’t atop a gold mine).

Finally, real estate and semiconductors have both discovered the advantages of building upwards. There’s only so much real estate in the world – they aren’t making any more – so you have to build vertically to maximize your property values. Thus, we get skyscrapers in high-value areas like Manhattan, Tokyo, Singapore, or London. There’s more square footage vertically than there is horizontally.

A few chip companies have dabbled in vertical construction, stacking a few silicon die here and there. But rarely has it been done so aggressively as with AMD’s new Radeon R9 Fury X graphics chip. Code-named Fiji (an island nation with very few skyscrapers), this new chip is actually 22 different chips all packed into one package. It’s the silicon equivalent of a GPU skyscraper. Or, at least, a decent apartment building.

Trouble is, building vertically is tough in either industry. It’s a whole different kind of engineering. The (ahem) architecture changes in both cases. Structural engineers have to figure out how to make the ground floor sturdy enough to support all the upper floors. And chip designers have to figure out how to connect chips that aren’t side by side, but are, instead, stacked one atop the other.

The reasons for stacking are entirely different, however. In semiconductor land, we aren’t trying to maximize the finite X and Y dimensions. Unlike real estate, they really are making more. There’s a virtually unlimited supply of silicon property to build on. Rather, the impetus is to improve performance. It’s quicker to move high-speed signals up and down than it is to move them across the die from side to side. Well, if you do things right, it is.

The 22 die encapsulated in AMD’s Radeon R9 measure more than 1000 mm² in total. That would be a big freakin’ device if it were one single-layer chip. Even Intel’s massive multicore x86 processors rarely stray beyond the 400–500 mm² range. You could see this thing from space, were it not collapsed and folded in on itself. Fully 16 of the 22 die are memory chips, and that’s where AMD and its partners added most of the innovation.

Graphics chips are memory hogs, as any gamer will tell you. Take a look at any recent, decent PC graphics card and you’ll see (or would see, under all the heat sinks) a lot of DRAMs surrounding the GPU. Fast, wide memory buses are a primary concern for GPU designers and a point of differentiation for their purchasers.

Rather than work with off-chip memory like normal GPUs do, AMD designed the R9 to incorporate its own DRAM on-chip. Or on-package, at least. The cluster of DRAMs surrounding the GPU has now vanished, subsumed into the GPU package itself. That makes for a much smaller graphics card for your PC – in theory. More on that later.

The 16 little DRAMs on the R9 are arranged in four stacks of four devices each. That makes the whole R9 cluster five stories high (GPU plus four DRAMs), not counting the interposer that underlies them all like the foundation under a building. All of the DRAMs are identical, and each one incorporates through-silicon vias to its upper and lower neighbors. Assuming the chips are lined up exactly right (no small feat), all the vias connect up to make a vertical wiring bus.

There are two 128-bit buses per die – one for reading and one for writing. These are not shared amongst the die in the stack; each DRAM gets its own pair of buses. With four die in the stack, that makes for eight 128-bit buses, or 1024 bits of data travelling vertically. And with four such stacks piled on the R9, that’s a 4096-bit data path to/from all the memory. Impressive. More importantly, it’s not something you could do with conventional off-chip memory buses. There just aren’t enough pins, and toggling that many signals at high speed would probably make the board bounce.

Interestingly, the massive bus between the GPU and the DRAMs doesn’t run all that fast. AMD specs it at 500 MHz, which is pretty sluggish compared to the 1- and 2-GHz clocks used with GDDR5. But AMD’s bus is so ridiculously wide that its overall bandwidth is far greater, which is the real point.

The downside to packing so much heat in one package is, yes, the heat. Although boards based on the R9 Fury can be fairly small because they don’t have to make room for a bunch of DRAMs, they do have to make room for liquid-cooling apparatus. So you’re basically just trading off the board space of one for the bulkiness of the other. On the plus side, you can locate the cooling hardware off-board if you want to, perhaps mounted to the PC chassis or in an adjoining bay. But, either way, you’re going to have to plumb the R9 Fury and engineer some decent airflow around it. There ain’t no free lunch, especially if you’re gunning for top performance.

AMD doesn’t make DRAMs, so the memories in question come from Hynix, which cooperated with AMD in defining the interface and which assembles the devices at its plant in Korea. The interface itself is nominally open, so anyone could make DRAMs and/or logic devices that use the same technique. Fiji is just the first.

The AMD/Hynix interface is similar in concept to the competing Hybrid Memory Cube (HMC) specification, but wholly incompatible with it. HMC has the backing of giants like Xilinx, Altera, ARM, and Micron, whereas AMD and Hynix seem to be on their own, at least for now. HMC has been around longer (at least in specification form), but actual devices that implement it are scarce on the ground. So in terms of deployment, they’re about the same.

It’s tough to build upwards, but that’s the way of the future. Memories, analog components, magnetics, and assorted other interfaces just work better on “nonstandard” semiconductor processes that don’t play well with all-digital circuits. You can compromise your devices, or you can manufacture them separately and combine them at the assembly stage. Shortening the interconnection doesn’t hurt, either. Once you go up, there’s no going back down.

AMD Gets All Tokyo with Fiji

Related

Leave a Reply Cancel reply

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

featured chalk talk