feature article
Subscribe Now

AMD Gets All Tokyo with Fiji

Twenty-two die in a single package?

The semiconductor business is a lot like selling real estate. It’s not the dirt you’re paying for; it’s the location. A square acre in the middle of Manhattan will cost you a lot more than an acre in the desert (provided it’s not in the middle of a Saudi oil field). Likewise, a square millimeter of 28-nanometer silicon can cost a lot or a little, depending on who made it and what they did with it.

To stretch the analogy a bit further, the cost of the real estate also depends on what “improvements” you’ve made to the property. An empty field isn’t worth as much as a developed lot with a four-story apartment building on it (again, assuming your field isn’t atop a gold mine).

Finally, real estate and semiconductors have both discovered the advantages of building upwards. There’s only so much real estate in the world – they aren’t making any more – so you have to build vertically to maximize your property values. Thus, we get skyscrapers in high-value areas like Manhattan, Tokyo, Singapore, or London. There’s more square footage vertically than there is horizontally.

A few chip companies have dabbled in vertical construction, stacking a few silicon die here and there. But rarely has it been done so aggressively as with AMD’s new Radeon R9 Fury X graphics chip. Code-named Fiji (an island nation with very few skyscrapers), this new chip is actually 22 different chips all packed into one package. It’s the silicon equivalent of a GPU skyscraper. Or, at least, a decent apartment building.

Trouble is, building vertically is tough in either industry. It’s a whole different kind of engineering. The (ahem) architecture changes in both cases. Structural engineers have to figure out how to make the ground floor sturdy enough to support all the upper floors. And chip designers have to figure out how to connect chips that aren’t side by side, but are, instead, stacked one atop the other.

The reasons for stacking are entirely different, however. In semiconductor land, we aren’t trying to maximize the finite X and Y dimensions. Unlike real estate, they really are making more. There’s a virtually unlimited supply of silicon property to build on. Rather, the impetus is to improve performance. It’s quicker to move high-speed signals up and down than it is to move them across the die from side to side. Well, if you do things right, it is.

The 22 die encapsulated in AMD’s Radeon R9 measure more than 1000 mm2 in total. That would be a big freakin’ device if it were one single-layer chip. Even Intel’s massive multicore x86 processors rarely stray beyond the 400–500 mm2 range. You could see this thing from space, were it not collapsed and folded in on itself. Fully 16 of the 22 die are memory chips, and that’s where AMD and its partners added most of the innovation.

Graphics chips are memory hogs, as any gamer will tell you. Take a look at any recent, decent PC graphics card and you’ll see (or would see, under all the heat sinks) a lot of DRAMs surrounding the GPU. Fast, wide memory buses are a primary concern for GPU designers and a point of differentiation for their purchasers.

Rather than work with off-chip memory like normal GPUs do, AMD designed the R9 to incorporate its own DRAM on-chip. Or on-package, at least. The cluster of DRAMs surrounding the GPU has now vanished, subsumed into the GPU package itself. That makes for a much smaller graphics card for your PC – in theory. More on that later.

The 16 little DRAMs on the R9 are arranged in four stacks of four devices each. That makes the whole R9 cluster five stories high (GPU plus four DRAMs), not counting the interposer that underlies them all like the foundation under a building. All of the DRAMs are identical, and each one incorporates through-silicon vias to its upper and lower neighbors. Assuming the chips are lined up exactly right (no small feat), all the vias connect up to make a vertical wiring bus.

There are two 128-bit buses per die – one for reading and one for writing. These are not shared amongst the die in the stack; each DRAM gets its own pair of buses. With four die in the stack, that makes for eight 128-bit buses, or 1024 bits of data travelling vertically. And with four such stacks piled on the R9, that’s a 4096-bit data path to/from all the memory. Impressive. More importantly, it’s not something you could do with conventional off-chip memory buses. There just aren’t enough pins, and toggling that many signals at high speed would probably make the board bounce.

Interestingly, the massive bus between the GPU and the DRAMs doesn’t run all that fast. AMD specs it at 500 MHz, which is pretty sluggish compared to the 1- and 2-GHz clocks used with GDDR5. But AMD’s bus is so ridiculously wide that its overall bandwidth is far greater, which is the real point. 

The downside to packing so much heat in one package is, yes, the heat. Although boards based on the R9 Fury can be fairly small because they don’t have to make room for a bunch of DRAMs, they do have to make room for liquid-cooling apparatus. So you’re basically just trading off the board space of one for the bulkiness of the other. On the plus side, you can locate the cooling hardware off-board if you want to, perhaps mounted to the PC chassis or in an adjoining bay. But, either way, you’re going to have to plumb the R9 Fury and engineer some decent airflow around it. There ain’t no free lunch, especially if you’re gunning for top performance.

AMD doesn’t make DRAMs, so the memories in question come from Hynix, which cooperated with AMD in defining the interface and which assembles the devices at its plant in Korea. The interface itself is nominally open, so anyone could make DRAMs and/or logic devices that use the same technique. Fiji is just the first.

The AMD/Hynix interface is similar in concept to the competing Hybrid Memory Cube (HMC) specification, but wholly incompatible with it. HMC has the backing of giants like Xilinx, Altera, ARM, and Micron, whereas AMD and Hynix seem to be on their own, at least for now. HMC has been around longer (at least in specification form), but actual devices that implement it are scarce on the ground. So in terms of deployment, they’re about the same.

It’s tough to build upwards, but that’s the way of the future. Memories, analog components, magnetics, and assorted other interfaces just work better on “nonstandard” semiconductor processes that don’t play well with all-digital circuits. You can compromise your devices, or you can manufacture them separately and combine them at the assembly stage. Shortening the interconnection doesn’t hurt, either. Once you go up, there’s no going back down. 

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

Product Update: Broad Portfolio of DesignWare IP for Mobile SoCs

Sponsored by Synopsys

Get the latest update on DesignWare IP® for mobile SoCs, including MIPI C-PHY/D-PHY, USB 3.1, and UFS, which provide the necessary throughput, bandwidth, and efficiency for today’s advanced mobile SoCs.

Click here for more information about DesignWare IP for 5G Mobile

featured paper

Reducing Radiated EMI

Sponsored by Maxim Integrated

This application note explains how to reduce the radiated EMI emission in the MAX38643 nanopower buck converter. It also explains the sources of EMI noise, and provides a few simple methods to reduce the radiated EMI and make the MAX38643 buck converter compliant to the CISPR32 standard Class B limit.

Click here to download the whitepaper

featured chalk talk

Using the Graphical PMSM FOC Component in Harmony3

Sponsored by Microchip and Mouser Electronics

Developing embedded software, and particularly configuring your embedded system can be a major pain for development engineers. Getting all the drivers, middleware, and libraries you need set up and in the right place and working is a constant source of frustration. In this episode of Chak Talk, Amelia Dalton chats with Brett Novak of Microchip about Microchip’s MPLAB Harmony 3, with the MPLAB Harmony Configurator - an embedded development framework with a drag-and-drop GUI that makes configuration a snap.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment (IDE)