feature article
Subscribe Now

Avoiding Failure Analysis Paralysis

Cadence Describes the DFM-Diagnostics Link

Back when I was a product engineer working on bipolar PALs (oops – I mean, PAL® devices), one of my main activities was figuring out what was wrong. That was most of the job of a product engineer: fix what’s broken. You don’t spend any time working on the stuff that’s working, you work on what isn’t working. Assuming it’s a chip that’s wrong, the process would typically start with a trip into the testing area to put a part on the tester and datalog it to see some evidence of things going awry. Armed with that, the next job was to spread a big schematic out on a table and start looking at the circuits, figuring out what could be causing the problem. You’d come up with a couple scenarios, and next you’d have to look in the actual chip.

Of course, in order to look at the chip, we had to spread a big layout sheet on a table to trace out where the circuits were physically located. Then we’d know where to look. The chip would have to be decapped – I could do that myself if it was a CERDIP (ceramic packaging, where you could pop off the top); otherwise you needed to go to one of those scary guys that knew just a bit too much about chemistry (and whom you wanted to keep happy with occasional gifts of jerky or sunflower seeds) to have a hole etched in the plastic. Hopefully that was enough, and then you could go into the lab and use microscopes and microprobes and oscilloscopes and such to poke through dielectric layers, perhaps cut a metal line to get to something below, and with any luck you’d identify a problem that could be fixed. In the worst case you had to go back to Scary Guy for more delayering, or perhaps a SEM session. Or – yikes – chemical analysis. It was all seat-of-the-pants, using forensic techniques worthy of CSI – Jurassic Edition, and you let your data and observations tell you what the next step should be.

Unfortunately, a few things have changed to complicate this serene pastoral picture of the past. Start with, oh, about a thousand more pins on the chip. Shrink the features way down, and multiply the number of transistors by, oh, say, a lot. Throw on a few extra layers of metal for good measure, and, well, you gotcherself a problem.

Diagnosing failing dice and then turning the information into useful measures for improving yields on current and future circuits is no trivial matter anymore. Not only have technical issues become more thorny, but even business issues have intruded into the picture. The urgency has also grown with the focus on Design for Manufacturing (DFM), an admittedly somewhat ill-defined series of technologies for improving the manufacturability of sub-wavelength chips (and whose real benefit is still subject to debate).

Following up on a presentation at the isQED conference, I was able to sit down with some of the folks from Cadence to get their view of what life looks like now. The process boils down to something that sounds rather straightforward and familiar: develop hypotheses about possible failure modes; gather lots of manufacturing data to support or weaken some of those hypotheses, and then narrow down the range of options for physical failure analysis (done by the modern-day scary guys – in the gender-neutral sense – that actually tear the stuff apart).

The challenges are partly those of scale. It’s no longer an easy matter to unroll a paper schematic onto the table in the Board Room. We’ve now gone paperless, and, even so, there are just too many things going on in a circuit to try to trace them by hand. That’s where tools can come in and identify, through simulation, all the logical scenarios that could contribute to the observed failure. The kinds of issues to be reviewed could include not only the traditional stuck-at faults, but also timing problems. An observed behavior could originate in any of a number of logic nodes, and having the candidates automatically identified can give you a solid set of candidate problems in a shorter time. Those candidates are pareto-ranked by level of confidence.

The next step was something of a problem for a period of time. This kind of yield analysis is most useful in the early days of a process. But let’s face it: with masks costing what they do, you have to have an opportunity for a huge yield increase to warrant new masks on a current product. As a result, while testing and inspection procedures for current products may benefit, many times the kinds of design improvements you learn about will apply only to future products. This kind of learning makes far more sense early in the lifetime of a given process.

But making sense out of the failures requires manufacturing data. Lots of it. And early in the life of a process, manufacturing data doesn’t look so good. And historically, fabs have been reluctant to let the data out. This wasn’t a problem before foundries were routine; you owned your own fab (as “real men” did back in the day), and you went to talk to your colleagues there. With the fab now owned by a different company, and that company not wanting to look bad compared to other foundries, there was much resistance to being open with data.

This issue is now more or less behind us; there really is no way to do solid engineering without having access to manufacturing data, so that business hurdle has been cleared. Resulting in the availability of data. Lots and lots of data. Tons of data. File it under “B” for, “Be careful what you ask for.” The next challenge then becomes making sense of all of that data as it relates to the particular failure scenarios under consideration. You can now, for example, look at a wafer, or series of wafers, to figure out where possible yield hot spots are. You can narrow in on some dice of interest and look at the test and manufacturing data from those parts. The idea is to correlate the possible failure modes with actual observances to further refine the list of promising hypotheses. The once daunting roster of things that might be wrong can be narrowed down, and the physical failure analysis folks will get a more focused list of things to look at for final evidence of what the issue is.

Many of the tools for this flow have been around for a while. For example, Cadence has had their Encounter Diagnostics tool since 2004. One of the missing links has been a means of viewing all of the manufacturing data in a coordinated manner; right now you sort of have to look at the data in more or less an ad hoc fashion. Cadence has been working on a tool that they’ve used in some select situations to help bridge the analysis of manufacturing data back to the original design; they’re still in the productization stage, but intend that this be a key piece of automation in a feedback loop that can refine the design and manufacturing rules.

So while the concepts driving yield enhancement haven’t changed, the motivations have gotten stronger, and tools have become critical for managing the complexity and the amount of data required for thorough analysis. On the one hand, it kinda makes you pine for a simpler time, when you just kind of rolled up your sleeves and sleuthed around. On the other hand, you can now focus more energy on those parts of the process where the human brain is the key tool, and let the EDA tools take care of some of the more mundane work.

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

Improve SoC-Level Verification Efficiency by Up to 10X

Sponsored by Cadence Design Systems

Chip-level testbench creation, multi-IP and CPU traffic generation, performance bottleneck identification, and data and cache-coherency verification all lack automation. The effort required to complete these tasks is error prone and time consuming. Discover how the Cadence® System VIP tool suite works seamlessly with its simulation, emulation, and prototyping engines to automate chip-level verification and improve efficiency by ten times over existing manual processes.

Click here for more information about System VIP

featured paper

How semiconductor technologies have impacted modern telehealth solutions

Sponsored by Texas Instruments

Innovate technologies have given the general population unprecedented access to healthcare tools for self-monitoring and remote treatment. This paper dives into some of the newer developments of semiconductor technologies that have significantly contributed to the telehealth industry, along with design requirements for both hospital and home environment applications.

Click here to download the whitepaper

Featured Chalk Talk

0 to 112 (Gbps PAM4) in 5 Seconds

Sponsored by Samtec

With serial connections hitting 112Gbps, we can’t mess around with our interconnect. We need engineered solutions that will keep those eyes open and deliver the signal integrity we need in our high-speed designs. In this episode of Chalk Talk, Amelia Dalton talks with Matt Burns of Samtec about the interconnect options for speeds up to 112Gbs, and Samtec’s Flyover interconnect technology.

Click here to download the Silicon-to-Silicon Solutions Guide