feature article
Subscribe Now

AMD Gets All Tokyo with Fiji

Twenty-two die in a single package?

The semiconductor business is a lot like selling real estate. It’s not the dirt you’re paying for; it’s the location. A square acre in the middle of Manhattan will cost you a lot more than an acre in the desert (provided it’s not in the middle of a Saudi oil field). Likewise, a square millimeter of 28-nanometer silicon can cost a lot or a little, depending on who made it and what they did with it.

To stretch the analogy a bit further, the cost of the real estate also depends on what “improvements” you’ve made to the property. An empty field isn’t worth as much as a developed lot with a four-story apartment building on it (again, assuming your field isn’t atop a gold mine).

Finally, real estate and semiconductors have both discovered the advantages of building upwards. There’s only so much real estate in the world – they aren’t making any more – so you have to build vertically to maximize your property values. Thus, we get skyscrapers in high-value areas like Manhattan, Tokyo, Singapore, or London. There’s more square footage vertically than there is horizontally.

A few chip companies have dabbled in vertical construction, stacking a few silicon die here and there. But rarely has it been done so aggressively as with AMD’s new Radeon R9 Fury X graphics chip. Code-named Fiji (an island nation with very few skyscrapers), this new chip is actually 22 different chips all packed into one package. It’s the silicon equivalent of a GPU skyscraper. Or, at least, a decent apartment building.

Trouble is, building vertically is tough in either industry. It’s a whole different kind of engineering. The (ahem) architecture changes in both cases. Structural engineers have to figure out how to make the ground floor sturdy enough to support all the upper floors. And chip designers have to figure out how to connect chips that aren’t side by side, but are, instead, stacked one atop the other.

The reasons for stacking are entirely different, however. In semiconductor land, we aren’t trying to maximize the finite X and Y dimensions. Unlike real estate, they really are making more. There’s a virtually unlimited supply of silicon property to build on. Rather, the impetus is to improve performance. It’s quicker to move high-speed signals up and down than it is to move them across the die from side to side. Well, if you do things right, it is.

The 22 die encapsulated in AMD’s Radeon R9 measure more than 1000 mm2 in total. That would be a big freakin’ device if it were one single-layer chip. Even Intel’s massive multicore x86 processors rarely stray beyond the 400–500 mm2 range. You could see this thing from space, were it not collapsed and folded in on itself. Fully 16 of the 22 die are memory chips, and that’s where AMD and its partners added most of the innovation.

Graphics chips are memory hogs, as any gamer will tell you. Take a look at any recent, decent PC graphics card and you’ll see (or would see, under all the heat sinks) a lot of DRAMs surrounding the GPU. Fast, wide memory buses are a primary concern for GPU designers and a point of differentiation for their purchasers.

Rather than work with off-chip memory like normal GPUs do, AMD designed the R9 to incorporate its own DRAM on-chip. Or on-package, at least. The cluster of DRAMs surrounding the GPU has now vanished, subsumed into the GPU package itself. That makes for a much smaller graphics card for your PC – in theory. More on that later.

The 16 little DRAMs on the R9 are arranged in four stacks of four devices each. That makes the whole R9 cluster five stories high (GPU plus four DRAMs), not counting the interposer that underlies them all like the foundation under a building. All of the DRAMs are identical, and each one incorporates through-silicon vias to its upper and lower neighbors. Assuming the chips are lined up exactly right (no small feat), all the vias connect up to make a vertical wiring bus.

There are two 128-bit buses per die – one for reading and one for writing. These are not shared amongst the die in the stack; each DRAM gets its own pair of buses. With four die in the stack, that makes for eight 128-bit buses, or 1024 bits of data travelling vertically. And with four such stacks piled on the R9, that’s a 4096-bit data path to/from all the memory. Impressive. More importantly, it’s not something you could do with conventional off-chip memory buses. There just aren’t enough pins, and toggling that many signals at high speed would probably make the board bounce.

Interestingly, the massive bus between the GPU and the DRAMs doesn’t run all that fast. AMD specs it at 500 MHz, which is pretty sluggish compared to the 1- and 2-GHz clocks used with GDDR5. But AMD’s bus is so ridiculously wide that its overall bandwidth is far greater, which is the real point. 

The downside to packing so much heat in one package is, yes, the heat. Although boards based on the R9 Fury can be fairly small because they don’t have to make room for a bunch of DRAMs, they do have to make room for liquid-cooling apparatus. So you’re basically just trading off the board space of one for the bulkiness of the other. On the plus side, you can locate the cooling hardware off-board if you want to, perhaps mounted to the PC chassis or in an adjoining bay. But, either way, you’re going to have to plumb the R9 Fury and engineer some decent airflow around it. There ain’t no free lunch, especially if you’re gunning for top performance.

AMD doesn’t make DRAMs, so the memories in question come from Hynix, which cooperated with AMD in defining the interface and which assembles the devices at its plant in Korea. The interface itself is nominally open, so anyone could make DRAMs and/or logic devices that use the same technique. Fiji is just the first.

The AMD/Hynix interface is similar in concept to the competing Hybrid Memory Cube (HMC) specification, but wholly incompatible with it. HMC has the backing of giants like Xilinx, Altera, ARM, and Micron, whereas AMD and Hynix seem to be on their own, at least for now. HMC has been around longer (at least in specification form), but actual devices that implement it are scarce on the ground. So in terms of deployment, they’re about the same.

It’s tough to build upwards, but that’s the way of the future. Memories, analog components, magnetics, and assorted other interfaces just work better on “nonstandard” semiconductor processes that don’t play well with all-digital circuits. You can compromise your devices, or you can manufacture them separately and combine them at the assembly stage. Shortening the interconnection doesn’t hurt, either. Once you go up, there’s no going back down. 

Leave a Reply

featured blogs
May 7, 2021
In one of our Knowledge Booster Blogs a few months ago we introduced you to some tips and tricks for the optimal use of Virtuoso ADE Product Suite with our analog IC design videos . W e hope you... [[ Click on the title to access the full blog on the Cadence Community site. ...
May 7, 2021
Enough of the letter “P” already. Message recieved. In any case, modeling and simulating next-gen 224 Gbps signal channels poses many challenges. Design engineers must optimize the entire signal path, not just a specific component. The signal path includes transce...
May 6, 2021
Learn how correct-by-construction coding enables a more productive chip design process, as new code review tools address bugs early in the design process. The post Find Bugs Earlier Via On-the-Fly Code Checking for Productive Chip Design and Verification appeared first on Fr...
May 4, 2021
What a difference a year can make! Oh, we're not referring to that virus that… The post Realize Live + U2U: Side by Side appeared first on Design with Calibre....

featured video

Introduction to EMI

Sponsored by Texas Instruments

Conducted versus radiated EMI. CISPR-25 and CISPR-32 standards. High-frequency or low-frequency emissions. Designing a system to reduce EMI can be overwhelming, but it doesn’t have to be. Watch this video to get an overview of EMI causes, standards, and mitigation techniques.

Click here for more information

featured paper

Four key design considerations when adding energy storage to solar power grids

Sponsored by Texas Instruments

Bidirectional power conversion, higher voltage batteries, current and voltage sensing, and a sleek storage system design are top considerations when adding energy storage to solar power grids. Read the latest whitepaper from Texas Instruments to unleash the power of storage-ready solar power grids.

Click to download whitepaper

featured chalk talk

Automotive Infotainment

Sponsored by Mouser Electronics and KEMET

In today’s fast-moving automotive electronics design environment, passive components are often one of the last things engineers consider. But, choosing the right passives is now more important than ever, and there is an exciting and sometimes bewildering range of options to choose from. In this episode of Chalk Talk, Amelia Dalton chats with Peter Blais from KEMET about choosing the right passives and the right power distribution for your next automotive design.

Click here for more information about KEMET Electronics Low Voltage DC Auto Infotainment Solutions