posted by Bryon Moyer
Cadence recently announced new extraction tools, claiming both greater speed (5x) and best-in-class accuracy for full-chip extraction. And what is it that lets them speed up without sacrificing results?
The answer is the same thing that has benefited so many EDA tools over the last few years: parallelism. Both within a box (multi-threading) and using multiple boxes (distributed computing). The tools can scale up to hundreds of CPUs, although they’re remaining mum on the details of how they did this…
They have two new tools: a new random-walk field solver (Quantus FS) and the full-chip extraction tool (Quantus QRC). They say that the field solver is actually running around 20 times faster than their old one.
The field solver is much more detailed and accurate than the full-chip extraction tool. It’s intended for small circuits and high precision; its results are abstracted for use on a larger scale by the full-chip tool. That said, they claim good correlation between QRC and FS, so not much is lost in the abstraction.
They’ve also simplified the FinFET model, cutting the size of the circuit in half and increasing analysis speed by 2.5x.
While QRC is intended for the entire chip, it can also be used incrementally – in which case it can be three times again as fast. Both the Encounter digital implementation tool and their Tempus timing analysis tool can take advantage of this incremental capability to do real-time extraction as the tools make decisions. It’s also integrated into the Virtuoso analog/custom tool.
As to accuracy, they say they meet all of TSMC’s golden FinFET data, that they achieve consistent results with single- and multi-corner analysis, and that they’ve been certified by TSMC for the 16-nm node.
Their fundamental capabilities are summarized in the following figure, although this coverage is consistent with the prior tools.
Image courtesy Cadence
You can read more in their announcement.
posted by Bryon Moyer
3D has been tossed about quite a bit over the last few years. We can ignore the 3D TV craze that came and went like an evanescent avatar. But the two IC manifestations have been 3D transistors (i.e., FinFETs) and 3D package integration – stacking chips.
The latter is a more-than-Moore technology that allows multiple chips, each built on processes best suited to it, with the ability to leverage high-volume off-the-shelf dice like memories instead of designing them from scratch.
But what if you want to scale like circuits vertically? That’s to say, things that aren’t available off the shelf and that all require the same process? Either you have to build them laterally on a single chip or build multiple chips and stack them.
Well, Leti is working on another option: monolithic 3D integration. What this amounts to is building a standard chip and then growing a new layer of silicon (or something) above it and building more circuits. Sounds pretty straightforward in concept, but it’s easier to visualize than it is to accomplish. They presented their status at the recent Semicon West gathering.
Image courtesy Leti
The biggest concern that always arises with these sorts of ideas is thermal. For the bottom layer, you build your transistors, implant your dopants, and then “activate” them using heat to get them moving to where they’re supposed to be. After that, you want them to stay there. They’ll keep moving if you keep the heat on, so once they’re set, you don’t want any more heat.
There are also apparently worries about the contact salicide stability in the presence of extra heat.
And where might the extra heat come from?
Well when you build the next layers of transistor, you need to dope them and activate again. If your bottom transistors are already where you want them, the extra activation will screw them up. Do you try to under-activate the bottom ones, hoping that the second activation will bring them in line?
That’s not the approach Leti is taking. They’re experimenting with a “crème brulee” technique: use a broiler for the second layer activation. That is, heat from the top so that only the top layer gets activated in a short enough time that the heat doesn’t diffuse down and mess up the lower transistors.
Compatibility with existing processes is another consideration. You have to be able to connect the upper and lower transistors, and, in theory, there is no such interconnect at present. Rather than define new interconnect, they’re leveraging the local interconnect (LI) for that piece.
Finally, a big question: how to build and arrange the transistors and CMOS pairs – and other elements like NEMS devices that might want to ride along on the same chip? They’re playing with three different configurations.
The first is “CMOS over CMOS.” In other words, you build both N and P types on the same layer (top and bottom). They list FinFET over FinFET, Trigate/nanowire over Trigate/nanowire (all SOI), or FDSOI over FDSOI. But they also have a drawing showing an FDSOI transistor over a FinFET. Their allegation is that two layers of 14-nm technology provide the scaling of a single layer of 10-nm technology.
The second option is to optimize the transistors by having N and P types on different layers. So, whereas the first option has CMOS pairs built laterally, they’re built vertically in this second option. This allows them to use different materials on the two layers. They’ve already tried germanium (Ge) for P over silicon for N. And they’ve leveraged different crystal orientations, with silicon  for P over silicon  for N. Next up they’ll try InGaAs for N over Ge for P.
The third option involves integrating NEMS over CMOS. We looked at their M&NEMS program last year (which work continues).
They did some FPGA work already just to see what kinds of improvements they can get . They used two stacked FDSOI layers and two levels of tungsten LI. They improved area by 55% (not surprising), but they also improved performance by 23% and power by 12%. Win win win. Apparently going local matters.
We’ll update as we see new results.
posted by Bryon Moyer
We’ve covered a lot about sensors here before, and in the huge majority of the cases, a sensor consists of a MEMS (or other) sensing element, an ASIC to clean up and digitize the signal, and then a series of registers where all the relevant data gets placed.
An outside entity, like a sensor hub, can then read those registers over a bus connection – typically I2C or SPI. What could be simpler?
Well, I guess an analog output could be simpler: you eliminate all of that messy digital stuff. But it seems to me that, running an analog signal halfway across town to get it to the analog inputs of a microcontroller (aka MCU, or whatever hub is used) would run the risk of seriously degrading the analog value in a way that wouldn’t happen with a digital signal.
(Click to enlarge)
Image courtesy Freescale.
I asked Freescale about this, and they justify it based on the wide variety of digital interfaces in use, in particular in industrial settings. Heck, they say that even CAN bus is leaving the confines of vehicles and moving into other applications.
Freescale makes lots of microcontrollers. This variety of MCUs partly reflects the diversity of interfaces they may talk to: Rather than having one large unit with all possible interfaces, they offer different devices. And yes, they’re assuming (or at least hoping) that you’ll be using their MCU.
So the idea goes thusly: first off, you simply don’t run the analog signals halfway across town. In these applications, an MCU is likely to be right nearby. (If not, then you want to move it so that it is nearby.) The MCU you choose will then reflect whatever bus you’re using, and that’s where you go digital. They prefer this, obviously, to having to have a bunch of different versions of the sensor to suit the various digital protocols.
There’s one other convenient thing about digital registers, however: they’re good at storing values while the rest of the system goes to sleep for a while to reduce power. Well, apparently these analog outputs can manage the same trick. The internal electronics shut down between samples, but the output is held between samples. This decouples the rate at which the MCU samples the analog outputs from the rate at which the sensor samples the system and allows power as low as 200 µA when running.
That’s how they see it; if you see it differently, then your comments are encouraged below.