3D IC Flow Challenges

Cadence recently announced a design flow for CoWoS 3D ICs. We’ve looked at some of the issues surrounding 3D IC technology before, but what we’ve looked at less are the specific ways designers will implement 3D ICs within their EDA flows.

So I talked with Cadence’s Brandon Wang to see what specific challenges they encountered and addressed in the flow that they announced. He listed three primary elements: heat, testing, and stress.

Things are heating up

It’s no surprise that thermal concerns top the list. Ever since the first discussions involving the stacking of ICs, one of the top questions has been, how will we dissipate the heat? Things are particularly tricky for memories, which generally have less generous temperature specs. The sense amps can be thrown off if the temperature gets too hot. Integrated memory: good. Integrated memory that doesn’t work: not so useful.

The memory situation is made worse by something else that’s supposed to be good: lots of I/Os. The whole wide-I/O thing is a real boon for bandwidth, but it also means you have lots of signals running at the same frequency and often sharing edges. While it’s not as bad as trying to drive actual package I/Os, it can still raise power and cause local heating.

So a designer will need to consider what’s underneath a memory chip – are there particular hot spots there? Might there be better ways to position the memory? Or would a different underlying chip floorplan improve the situation?

And of course, memories aren’t the only chips that warrant consideration; they’re just really obvious one. You might imagine that an analog chip might also be impacted by heat from an underlying die.

Clearly a thermal analysis tool is in order for understanding how the heat’s going to flow. But traditional thermal tools have benefited from the fact that it can take seconds for heat in a die to migrate out through the package and leadframe. Such relatively slow dynamics have allowed what is effectively static thermal analysis.

But the distance between one die and another in a 3D IC is much smaller, and there might even be heat-dissipating materials in the gap. So the heat can move from one die to another in a mere 0.4 seconds, much less time than before. What that effectively means is that dynamic thermal analysis tools are needed. Such tools are very time-consuming to run today; Cadence says they’re working on new versions that will run much more quickly.

Tests are telling tales

Testing gets more complicated, not for any specific technical reasons, but because the business model is more complicated. With standard chips, one company builds the chip and puts it in a package. (OK, a separate company somewhere cheap may do the actual assembly, but only as a subcontractor to the chipmaker.) So when things go wrong, the locus of responsibility is pretty easy to identify.

But it will be a rare 3D IC where everything in the package is made by the same company. In most cases, a company will design an SoC and pair it up with chips made by other companies – DRAMs, non-volatile memory, and perhaps an analog chip or two. Even if each component is running 90-something-% final-test yields on their own, the compounded yield of all of those components along with testing fallout can push overall yield down into the 50% range. So if yields are crummy, you’ll want to figure out whodunit.

We’ve talked about the “test elevator” concept before; it extends the scan paths in the vertical direction, allowing segmentation of the tests and – critically – identification of whose component failed. Test software has been augmented to handle this new responsibility.

We’ll be talking in more detail about the economics of yield and testing in a future piece (although it will focus more on 2.5-D chips for reasons we’ll elucidate then).

Devices are stressing out

The third issue isn’t an electrical one, but it has electrical implications. 3D integration introduces new mechanical stresses into the system. Those stresses come both intra-die, due to the through-hole vias (TSVs) that get drilled through the chip, and inter-die, due to the bonding of one chip over another and any mismatches in expansion and contraction as things heat up and cool down.

This can affect how the transistors behave. Which should be no surprise, since stress has explicitly been used to improve the mobility of transistors. The problem here is that we’re introducing new stresses, and they haven’t been figured into the characterization data that you rely on in your cell libraries. So the simulations you do based on those libraries will be wrong if the local stresses differ from what the libraries assume.

Now… it is, in principle, possible to adjust for those stresses. That would require a tool that would figure out what those modified stresses are and how they are physically distributed throughout the various dice. No such tool exists, according to Cadence, and my sense is that none is forthcoming.

So rather than relying on an analytical approach, the preferred approach will be rules-based. Most obvious would be the keep-out zones around the TSVs. This ensures that any transistors will be far enough away from the TSV to keep the transistors from feeling the TSV stresses. Likewise, any issues dealing with the placement of one die over another will be handled by rules.

While perhaps intellectually less satisfying, such an approach reflects a pragmatic means of getting something done today rather than waiting for some more elegant future solution that would provide nominal benefit beyond what the rules will enable.

Whether dealing with thermal, testing, or stress, the memory guys are having an outsized say in the standards that are evolving to make all of this happen smoothly. That’s because the economics of memories rely on their being high-volume commodity chips (much as everyone laments being relegated to commodity status). If some guy is designing an SoC that needs a memory on top, the memory guys don’t want to have to create some custom configuration for the convenience of the underlying SoC.

In general, it’s going to be the responsibility of the SoC designer to accommodate all of the off-the-shelf chips that will complete the 3D package. Which seems reasonable. It’s like pulling standard cells out of a library – only these ones happen not to reside on the same die. And they’re not in the library; they’re in some distributor’s catalog.

This whole heterogeneous 3D IC thing (i.e., not a stack of like dice) is just getting started, so it’s reasonable to expect that the flow will be tweaked as real engineers figure out what works and what doesn’t work and relay those thoughts to their friendly neighborhood EDA dude. I’m sure we haven’t heard the last word on this.

More info:

Cadence’s 3D IC flow