feature article
Subscribe Now

AMD Gets All Tokyo with Fiji

Twenty-two die in a single package?

The semiconductor business is a lot like selling real estate. It’s not the dirt you’re paying for; it’s the location. A square acre in the middle of Manhattan will cost you a lot more than an acre in the desert (provided it’s not in the middle of a Saudi oil field). Likewise, a square millimeter of 28-nanometer silicon can cost a lot or a little, depending on who made it and what they did with it.

To stretch the analogy a bit further, the cost of the real estate also depends on what “improvements” you’ve made to the property. An empty field isn’t worth as much as a developed lot with a four-story apartment building on it (again, assuming your field isn’t atop a gold mine).

Finally, real estate and semiconductors have both discovered the advantages of building upwards. There’s only so much real estate in the world – they aren’t making any more – so you have to build vertically to maximize your property values. Thus, we get skyscrapers in high-value areas like Manhattan, Tokyo, Singapore, or London. There’s more square footage vertically than there is horizontally.

A few chip companies have dabbled in vertical construction, stacking a few silicon die here and there. But rarely has it been done so aggressively as with AMD’s new Radeon R9 Fury X graphics chip. Code-named Fiji (an island nation with very few skyscrapers), this new chip is actually 22 different chips all packed into one package. It’s the silicon equivalent of a GPU skyscraper. Or, at least, a decent apartment building.

Trouble is, building vertically is tough in either industry. It’s a whole different kind of engineering. The (ahem) architecture changes in both cases. Structural engineers have to figure out how to make the ground floor sturdy enough to support all the upper floors. And chip designers have to figure out how to connect chips that aren’t side by side, but are, instead, stacked one atop the other.

The reasons for stacking are entirely different, however. In semiconductor land, we aren’t trying to maximize the finite X and Y dimensions. Unlike real estate, they really are making more. There’s a virtually unlimited supply of silicon property to build on. Rather, the impetus is to improve performance. It’s quicker to move high-speed signals up and down than it is to move them across the die from side to side. Well, if you do things right, it is.

The 22 die encapsulated in AMD’s Radeon R9 measure more than 1000 mm2 in total. That would be a big freakin’ device if it were one single-layer chip. Even Intel’s massive multicore x86 processors rarely stray beyond the 400–500 mm2 range. You could see this thing from space, were it not collapsed and folded in on itself. Fully 16 of the 22 die are memory chips, and that’s where AMD and its partners added most of the innovation.

Graphics chips are memory hogs, as any gamer will tell you. Take a look at any recent, decent PC graphics card and you’ll see (or would see, under all the heat sinks) a lot of DRAMs surrounding the GPU. Fast, wide memory buses are a primary concern for GPU designers and a point of differentiation for their purchasers.

Rather than work with off-chip memory like normal GPUs do, AMD designed the R9 to incorporate its own DRAM on-chip. Or on-package, at least. The cluster of DRAMs surrounding the GPU has now vanished, subsumed into the GPU package itself. That makes for a much smaller graphics card for your PC – in theory. More on that later.

The 16 little DRAMs on the R9 are arranged in four stacks of four devices each. That makes the whole R9 cluster five stories high (GPU plus four DRAMs), not counting the interposer that underlies them all like the foundation under a building. All of the DRAMs are identical, and each one incorporates through-silicon vias to its upper and lower neighbors. Assuming the chips are lined up exactly right (no small feat), all the vias connect up to make a vertical wiring bus.

There are two 128-bit buses per die – one for reading and one for writing. These are not shared amongst the die in the stack; each DRAM gets its own pair of buses. With four die in the stack, that makes for eight 128-bit buses, or 1024 bits of data travelling vertically. And with four such stacks piled on the R9, that’s a 4096-bit data path to/from all the memory. Impressive. More importantly, it’s not something you could do with conventional off-chip memory buses. There just aren’t enough pins, and toggling that many signals at high speed would probably make the board bounce.

Interestingly, the massive bus between the GPU and the DRAMs doesn’t run all that fast. AMD specs it at 500 MHz, which is pretty sluggish compared to the 1- and 2-GHz clocks used with GDDR5. But AMD’s bus is so ridiculously wide that its overall bandwidth is far greater, which is the real point. 

The downside to packing so much heat in one package is, yes, the heat. Although boards based on the R9 Fury can be fairly small because they don’t have to make room for a bunch of DRAMs, they do have to make room for liquid-cooling apparatus. So you’re basically just trading off the board space of one for the bulkiness of the other. On the plus side, you can locate the cooling hardware off-board if you want to, perhaps mounted to the PC chassis or in an adjoining bay. But, either way, you’re going to have to plumb the R9 Fury and engineer some decent airflow around it. There ain’t no free lunch, especially if you’re gunning for top performance.

AMD doesn’t make DRAMs, so the memories in question come from Hynix, which cooperated with AMD in defining the interface and which assembles the devices at its plant in Korea. The interface itself is nominally open, so anyone could make DRAMs and/or logic devices that use the same technique. Fiji is just the first.

The AMD/Hynix interface is similar in concept to the competing Hybrid Memory Cube (HMC) specification, but wholly incompatible with it. HMC has the backing of giants like Xilinx, Altera, ARM, and Micron, whereas AMD and Hynix seem to be on their own, at least for now. HMC has been around longer (at least in specification form), but actual devices that implement it are scarce on the ground. So in terms of deployment, they’re about the same.

It’s tough to build upwards, but that’s the way of the future. Memories, analog components, magnetics, and assorted other interfaces just work better on “nonstandard” semiconductor processes that don’t play well with all-digital circuits. You can compromise your devices, or you can manufacture them separately and combine them at the assembly stage. Shortening the interconnection doesn’t hurt, either. Once you go up, there’s no going back down. 

Leave a Reply

featured blogs
Oct 5, 2022
The newest version of Fine Marine - Cadence's CFD software specifically designed for Marine Engineers and Naval Architects - is out now. Discover re-conceptualized wave generation, drastically expanding the range of waves and the accuracy of the modeling and advanced pos...
Oct 4, 2022
We share 6 key advantages of cloud-based IC hardware design tools, including enhanced scalability, security, and access to AI-enabled EDA tools. The post 6 Reasons to Leverage IC Hardware Development in the Cloud appeared first on From Silicon To Software....
Sep 30, 2022
When I wrote my book 'Bebop to the Boolean Boogie,' it was certainly not my intention to lead 6-year-old boys astray....

featured video

PCIe Gen5 x16 Running on the Achronix VectorPath Accelerator Card

Sponsored by Achronix

In this demo, Achronix engineers show the VectorPath Accelerator Card successfully linking up to a PCIe Gen5 x16 host and write data to and read data from GDDR6 memory. The VectorPath accelerator card featuring the Speedster7t FPGA is one of the first FPGAs that can natively support this interface within its PCIe subsystem. Speedster7t FPGAs offer a revolutionary new architecture that Achronix developed to address the highest performance data acceleration challenges.

Click here for more information about the VectorPath Accelerator Card

featured paper

Algorithm Verification with FPGAs and ASICs

Sponsored by MathWorks

Developing new FPGA and ASIC designs involves implementing new algorithms, which presents challenges for verification for algorithm developers, hardware designers, and verification engineers. This eBook explores different aspects of hardware design verification and how you can use MATLAB and Simulink to reduce development effort and improve the quality of end products.

Click here to read more

featured chalk talk

Clamping Down on Failure: Protecting 24 V Digital Outputs

Sponsored by Mouser Electronics and Skyworks

If you're designing IEC61131 compliant digital outputs for these PLCs or industrial controllers, you need to have a plan to protect these outputs from a variety of unknowns. In this episode of Chalk Talk, Amelia Dalton chats with Asa Kirby from Skyworks about an innovative new isolated smart switch device from Skyworks that gives you an unprecedented level of channel flexibility and protection, letting you offer customers a truly “set it and forget it” solution when it comes to your next PLC design.

Click here for more information about Skyworks Solutions Inc. Si834x Isolated Smart Switches