Taming Variability

Back in the dark ages, when I first moved into the semiconductor realm, I used to compare the process geometries with the thickness of a human hair – which caused gasps of disbelief in a lay audience. Holding up a four-inch wafer of 16K SRAMS alongside a transistor can with its three wires, then explaining the many hundreds of thousands of can equivalents that were contained in the wafer, usually also caused gasps.

At a recent presentation I heard Kelin Kuhn, an Intel Fellow and a dynamic speaker, explain that a 32nm memory cell was dwarfed by a human red blood cell. Now I know how thick a human hair is, and I think of blood cells as pretty small, so the graphic she showed was incredible. She then went on to show a 1980 SRAM cell (the competitor of the one I used to wave around) and the 32nm memory gets seriously lost somewhere in the contact. In fact we are looking at a 10,000-to-1 scaling by area: in the space of the 1980 SRAM memory cell you could pack 10,000 32nm memory cells. Another slide at the same event showed a state of the art FPGA with 2,500,000,000 transistors. (So three of those combined will provide one transistor for every person on the planet with nearly a billion spare, if the US Census Office’s estimate of 6,768,167,712, at July 1^st is correct.)

I know that all this was inevitable if we continued to track Moore’s law, but to see evidence of it on the screen was a bit of a shock. About the time we were building those 16K SRAMs, people were predicting the end of Moore’s law. In fact, as an exercise, one of the process guys calculated that at a 1-micron process node, a cubic micron of silicon would have around one atom of dopant and clearly would never work as a semiconductor. Then there was the wavelength of light as an obstacle, then there was designing these beasts, then there was …

In fact, when 1 micron came, there were many thousands of atoms of dopant in a channel. Today, at 32nm, a channel has less than a hundred atoms. So a relatively minor fluctuation in the amount of dopant (Random Dopant Fluctuation – RDF) will be directly reflected in variations in transistor voltage threshold.

And that is what the presentation was all about: process variation. It was the keynote of a two-day conference on CMOS variability, run by the UK’s National Microelectronic Institute (NMI).

As Kelin Kuhn has pointed out, concern about the effects of process variation is not new – she has cited a paper by Schockley written in 1961. However, with today’s process nodes and with the huge number of die on a wafer, the problem is becoming more acute. At the same time, the challenges of trying to work round or remove the problems continue to attract some very clever people and produce some very ingenious solutions.

To describe the issues, the process people have come up with some interesting jargon: dog bones, icicles, dummification, none of which are really relevant to this discussion. But we do need to understand two key terms: random and systematic variability. Random variability is innate to the silicon manufacturing process – whatever we do, there will always be flaws, but these can be statistically planned for. Systematic variability comes from specific patterns created for manufacture. They cannot be easily represented by a general rule and can often be managed only by detailed modelling of the manufacturing process.

The statistical analysis that is carried out in process development labs around the world has produced some wonderful equations. I won’t reproduce any here, but if you have a secret hankering for complex formulas of the kind cartoonists use to illustrate “science,” look at some issues of IEEE Transactions on Electron Devices. A particularly fine example is Stolk’s formulation for random dopant fluctuation.

Let’s start with some of the physical issues. One thing that I find very hard to cope with is how light-based techniques are still being used when the features being created are much smaller than the wavelengths of light: a 45nm transistor is 5.5X smaller than 193nm light from a deep ultra-violet laser that is used as the light source for patterning. The reason this works can be attributed to a bundle of tools collectively called computational lithography, including Reticule Enhancement Technology (RET), which uses multilayer masks to create interference patterns to enhance resolution and Optical Proximity Correction (OPC), which alters the shapes in the mask to counter the distortions that will occur from the projection and the misshapen patterns that the layering and etching create. In effect, you work backwards from the shape you want to achieve, through all the different factors that would distort it, to come up with a pattern that you can draw on a mask set. Computational lithography takes a lot of computation and is a heavy user of parallel processing. It is closely linked to the specific manufacturing process and is normally invisible to the design team, although a considerable contributor to the increasing cost of mask making.

Again back in the dark ages, design rule checking was a guy with a ruler working his way around a hand drawn circuit. (We still had draftsmen with parallel motions on a drawing board. They took a back-of-the-envelope scribble and translated it into a series of rectangles, which were then photographed. At least we were no longer using sticky tape to create the rectangles, or cutting Mylar with an Exacto knife. EDA? – we had never heard of it.)

Today, in addition to the standard design rules, a whole new set of design restrictions have been developed. The classic way to lay out features was to criss-cross them, both north-south and east-west. Intel’s 32nm process uses a unidirectional layout – all features run east-west on the die. There is a uniform gate dimension and a gridded layout – each line is one of a set number of units. This does put constraints on designers, but Intel claims that it produces enormous improvements in manufacturability and an equal drop in variability. Certainly the photo-micrographs of 32nm products were very clean compared to those produced at the 45nm process node.

Another way around variability issues is to design circuits where variants in voltage or current do not affect the output of the function that the circuit implements. An example Intel quotes is chopping: inputs to a differential amplifier are swapped under control of a clock signal and then outputs are swapped back with the same signal before being low-pass filtered.

Each of these examples demonstrates a different element of the approaches. Computational lithography is process specific, chopping is a pure design technique, and design rules cross the barrier between process and design. This means that the people involved in addressing the issues of variability are drawn from the entire semiconductor industry spectrum. Speakers at the NMI conference included IP companies, integrated device manufacturers, fabless chip companies, EDA companies, and research institutes.

One interesting development has been work at IMEC in Belgium. The research centre has been working on an approach called Variability Aware Modeling (VAM). The first tool is MemoryVAM that predicts yield loss of SRAMs caused by the process variations of deep-submicron IC technologies. Using a model that reflects features such as cycle time, access time, and power consumption (both static and dynamic), designers of SoCs and other devices can get some understanding of the yield. This has been transferred to Samsung and has already been used in design.

There has been a recent round of doom and despair over Moore’s law. Kuhn is having none of this. She ended her presentation by saying that, in grappling with issues of variability, with the process, design and tools people working together, “we can achieve things that are even better than our wildest imaginations.” I think this is a more general philosophy for he whole microelectronics industry.