Braving the Black-and-White

You can’t figure out whether it’s a bad dream or just a dream. You’re gliding down an escalator towards a large, subterranean space. All you see is black and white, as if color has been banished from the building. Despite your misgivings, the escalator delivers you into the monochromatic morass. Shoulder-to-shoulder suits explain the chromatic deficit. Interactions are formal, even stilted. Snippets of banter range from the banal to the arcane. You question whether you could cast a cogent contribution into any of the conversations, whether you really belong here. You pinch yourself; you try to shake yourself awake. And you realize… it’s not a bad dream. It’s not a dream at all. It’s ISSCC, one of the least commercial of conferences for chip makers. You’re in a foyer full of engineers on a coffee break between sessions. Not a logo or company banner in sight. Marketing dudes not welcome. It’s about substance, not style. Go technical or go home.

What’s interesting is that, at first impression, you would think that attendance numbers were thumbing their noses at the economy. Whispers are that the numbers were, in fact, down, but you wouldn’t know that if you stumbled in during a break. Released from their conference rooms, throngs of the techie-est of types choked the San Francisco sidewalks within a radius of several blocks of the conference venue.

ISSCC sessions tend to range from far-reaching general technology to very specific discussions of narrow issues. My tendency is to focus on the broader technological presentations, and those are typically embodied by technology-leading products: processors and memory. There’s actually one technology-leading product segment that’s nowhere to be found in the official line-up: that of FPGAs. FPGA makers send representatives and presumably benefit from the works that are reported, but no sessions are dedicated to FPGAs, and there were no papers related to FPGAs (that I could discern, anyway). At first blush, this would seem to be an obvious result of the duopoly in that business; Altera and Xilinx would only be sharing secrets with each other if they presented. Yes, there are other FPGA vendors, and no, my intent isn’t to trivialize them, but their contributions would be expected to be more sporadic; at least this year, none of them had anything to say.

You’d think the same might be the case for Intel and AMD, and yet both have a presence at ISSCC. Intel, in particular, is typically expected to make significant contributions to the curriculum. The last couple years, they’ve had two presentations in the main processor session: one where they present a chip at a higher level, followed by a more detailed one. So, somehow, in that even less-balanced duopoly, they still find a way to present.

While the sessions demonstrated numerous goals and benefits of the work done, no single theme stuck out more than power reduction. Everybody seems to have their eye on the energy bill. Whether it meant dealing with active power, leakage current, or inactive sections of circuitry, most circuit innovations were presented in light of their power implications as a primary or immediate secondary consideration: either they reduced power or they didn’t increase power appreciably.

Intel’s main processor sessions featured a number of ways in which they attacked power in their new 45-nm 2.3-billion-transistor Xeon behemoth. The use of high-κ gate dielectric reduced n-channel leakage by 20 times and p-channel leakage by 1000 times. Clock gating was widely used, and power disabling reduced core power by 40 times and the cache power by 83%. Note that for the cache, this came as a result of lowering V_DD to 0.35 V from 0.9 V, better than just putting it in “sleep” mode, where the voltage comes down to about 0.75 V and reduces power by only 35%. When cores and caches are shut down, their so-called “turbo-mode” re-allocates some of the saved power to the cores still in use to provide a better power/performance point. A power-up detector can also tell which ports are unused when the chip starts up, saving 2W per port by shutting down the PLLs on idle ports.

The ability to shut down some parts of the circuit requires power gating, which is non-trivial for a chip of this nature, where power noise due to insufficient power bussing or a resistive gating transistor could kill performance. The bussing was addressed by putting it on the topmost layer and making that layer ten times thicker than the other metal layers. The monster gating transistors, used at each core, and through which all the core’s power flows, have a gate length of about 1.5 meters – each – to minimize resistance and voltage drop.

Intel reduced power on their new 32-nm SRAM technology by building in a retention state during which their bitline floats following a pre-charge and the array wordline supplies are put into a “sleep” state. Having a sleeping SRAM array sounds deadly for a volatile storage technology, but, as with the cache in the Xeon processor, this isn’t a power-off: it’s a power-down. They provide a transition from V_DD to a regulated “hold” voltage that’s lower than V_DD when going into sleep mode. This transition happens automatically, and the internal voltage on the array slowly drifts down to the hold voltage. The floating bitline reduces subarray leakage by 18%; the array sleep mode reduces it by 23%; and the wordline reduces it by a further 15%. The combined total is a 58% leakage reduction.

Meanwhile, Toshiba took a different power supply approach, and for a different reason. The goal was to reduce the cell size by using a smaller channel area than scaling would allow. They did this by using a dual supply: a higher voltage for the array and wordline and a lower voltage for the logic and bitline. This may sound simple, but it’s a delicate balancing act. They had to be careful with the wordline driver: if it’s too weak, they’ll have write failures because they can’t flip the cell. On the other hand, if it’s too strong, they may inadvertently flip extra cells when writing, causing disturb failures. So they actually have a programmable wordline driver that they can trim to set the drive in the right range. As a result, they ended up with a cell size 10% smaller than scaling trends would have predicted.

In the non-volatile space, the University of Tokyo and Toshiba took a different approach to providing power for NAND flash logic. The tough thing about NVMs is that they typically require much higher voltages for programming than is available in the system. Long gone are the years when you could ask system designers to provide a separate 12- or 20-V supply. Now it has to be done on-chip, traditionally with charge pumps. But the voltage being pumped keeps dropping, so that with a V_DD of 1.8 V, the charge pump can use more energy than the memory core. They addressed this by eliminating the charge pump and replacing it with a boost converter and an adaptive controller. Most such converters have a fixed duty cycle that either pumps up quickly, with too large a step to adjust with fine gradations, or pumps up using fine gradations, taking too long to reach voltage. Instead, they used a variable approach with three different DC/frequency ranges so that you start quickly, with rough jumps, and then transition to finer and finer gradations; it’s like changing sandpaper to finer grain as you approach the desired smoothness. The result of the shift from charge pumps to the boost converter was 68% lower total power.

This work was done on a 3-D SSD, which is made up of a DRAM layer, several NAND flash layers, and then a NAND Flash controller layer on top. Normally, each NAND Flash layer would have its own charge pump. But by switching to the boost converter, the charge pumps were eliminated from each Flash layer (reducing die size and power), and the converter was put on the controller layer, with a spiral inductor being added there as well.

Samsung presented some DRAM work that involved a four-die stack. In order to isolate the internal busses from I/O loading, separate buffers and control logic were used on one of the dice, called the master die; the other dice were referred to as slave dice, and they didn’t talk directly to the outside world. This arrangement reduced power even further due to circuitry not required for the slaves; it increased performance because the slaves didn’t have to drive external I/Os. Overall, they got a 50% reduction in standby power and a 70% reduction in active power.

This 3-D DRAM chip was built using TSV technology in order to connect the different DRAM layers. Such connections, of course, constitute yet another level of interconnect, and that’s of particular concern for the power. So an extra set of edge pads were added to keep the power noise under control. They also used redundant vias to improve yield. Traditional approaches to via redundancy allocate a given number of “normal” vias, plus some redundant ones that can be swapped in as needed. However, the routing needed to make this effective can become a challenge. Instead, Samsung didn’t make a distinction between normal and redundant vias; if one via failed, its neighbor could be swapped in, and all the vias next to that one shifted one over. This provided much more flexibility for much less routing. Manufacturing yields were in the 98% range rather than the 15% more typical of quad-die package (QDP) yields.

These highlights, of course, only scratch the surface of what was available during the week. You could even marvel over such unusual fare as a paper on body-coupled communication – yes, you too, and all your innards, can become part of the latest communication web. Talk about social networking… More of what happened can be found – or purchased – on the ISSCC web page.