feature article
Subscribe Now

To Err Is Universal

Listening to Alexander Pope (and even Seneca and Cicero long before), you would think that erring is a particularly human trait. Taken out of the original context (which places human fallibility in contrast to the divinity of forgiveness or the diabolicalness of persisting in erring), you might feel singled out by nature as a particularly unreliable entity.

In fact, we are surrounded by errors. They happen all the time. We are but a wee part of a huge, interconnected, error-prone system. Our very existence is a testament to the ability of living forms to survive any number of errors that might occur. The more we study genetics and the geological history of our planet, the more we learn about the range of redundant systems and adaptations that have evolved to bring us this far. (Or, looked at from an anthropic perspective, that had to evolve to get us this far.)

With a few exceptions, however, the machines we humans create have no such adaptability. Whereas the natural world abides variation and randomness, we create deterministic systems that assume a relatively narrow range of operating conditions such that a given input will guarantee a correct output.

And, so far, this has worked pretty well. But, in reality, what we’re doing is approximating the real world and, basically, rounding off the error as insignificant. The problem is, this error is becoming more significant.

Reliability engineers will talk in terms of Failures In Time (FITs); one FIT equals one failure in 109 hours – roughly 115,000 years. Long enough to where a lawyer will have a hard time pinning a failure on you and penalizing your progeny.

So how many failures can we tolerate? As described by Synopsys’s Yervant Zorian, for most of the world, this has been about 1000 FITs – especially for networking. But for certain critical areas, it’s much lower: medical systems require 50-FIT levels, and automotive demands a miniscule 0.1 FITs. If you wonder why medical isn’t the lowest, I suppose we can interpret the lower automotive rate as the proverbial ounce of prevention to medical’s pound of cure.

Note that these are system failure rates. The system fails when any one (or more) of its components fails. And the one component that has gotten more attention than any other has been the memory.

I suppose you could debate whether memory is more important than any other component, although if we think about it in our own terms, losing our own memories can eventually be more debilitating than the loss of some other functions (with the exception of the ability to forget that particular night during that particular pub crawl; such loss can perhaps be considered yet another of nature’s error-coping mechanisms: in this case, one of mercy).

The use of an error-correcting code (ECC) in memory is nothing new, and there’s a fair bit of IP available. But much of the focus of ECC has been for handling manufacturing errors and, typically, single-bit errors. What’s changing is that soft errors in the field, while always possible, are becoming more of an issue, and they affect more than one bit at a time. There are three contributors to this:

  • voltages are coming down, reducing the level of margin;
  • the amount of charge stored in a memory cell is lower, meaning less energy is required to upset the cell;
  • and the size of cells is smaller, meaning that an alpha particle can disturb more than one cell at a time.

Couple this with the huge increases in memory going into systems, and it starts to get very hard to keep FIT rates within manageable levels.

Synopsys’s approach to this is manifest in the form of their newly released DesignWare STAR ECC product (courtesy of their acquisition of Virage). This is essentially a memory compiler that allows engineers to build in a level of correction through additional bits and proper coding and then evaluate the FIT rates that would result. The goal is to achieve the desired FIT rate using the smallest area possible.

It’s not just a matter of adding bits to the bytes, however. Scrambling – that is, mixing up which physical cells belong to which logical bytes – becomes more important so that, when multiple cells are disturbed by a single event, those cells will be less likely to cohabitate within a single byte or word. Multiple bytes with single-bit errors are easier to correct than a single byte with multiple-bit errors. The STAR ECC compiler allows you to enter information about the physical cell adjacencies using a language called MASIS; this lets the tool figure out how to scramble the bits.

Using this methodology, you start by designing your memory; then you use the STAR ECC compiler to create the field-correction redundancy. You can then insert the STAR wrapper for handling manufacturing and testing issues.

Meanwhile, in a completely different corner of the continent, another company is taking a completely different approach to ECC – and to logic. As startup Lyric Semiconductor’s Dave Reynolds says, “2+2=4 is a solved problem.” There are numerous other problems where there isn’t necessarily a right or wrong answer – there’s a “most likely” answer (with, of course, a possibility of error).

Lyric refers to this as “probability processing.” They point out a number of applications – gene sequencing, search, and ECC to name some – that are essentially probability problems to be solved.

And, while they have a general-purpose probability processor in the works for the future, ECC is the first problem they are attacking with a purpose-built piece of IP for handling high-end ECC (for things like solid-state disks) to compete with traditional digital approaches.

And, while they are not disclosing details about their circuits at this point, they are decidedly not digital. In fact, in the general case, they take an analog input and provide a digital output. They don’t use “bits”; they use “pbits” (probability bits). They don’t use Boolean gates, they use Bayesian gates. They describe the pbits as flowing multi-directionally through the Bayesian gates such that every variable talks to every other variable. The effect is that of highly parallel computing.

Each cell resolves its input to a 1 or 0 in an iterative fashion; each iteration takes 3-8 cycles. While this introduces some latency, as long as the latency is less than the time it takes to fetch the next word, it is hidden.

To write programs for this kind of processor, they have developed a language they call PSBL (Probability Synthesis to Bayesian Logic). Looking at some snippets of code, it appears rather opaque, but they acknowledge this initial impression and say that, once you’ve learned the syntax, it’s actually easy to program in.

They provide some comparisons to the standard digital approach (for an ECC of similar quality) to drive home the benefits they see. From a circuit size standpoint, they claim to be 30X smaller at 1 Gbps and 70X smaller at 6 Gbps. They claim that power is reduced by 12X. (And yes, for any pedants, I’m aware of the theoretically-confusing concept of “nX smaller”… file it under “Y” for “You know what I mean.”)

The other comparison they provide is that their circuit at 180 nm is smaller and lower-power than the equivalent digital circuit at 45 nm.

And so we have two very different ways of approaching ECC; it will presumably be some time before a winner is decided. But, based on these efforts, we may have to provide a variation on our opening theme (with apologies to Pope and the Romans), as “To err is too much of a pain in the butt; to correct is essential.”

Yeah, you’re right… not sure they’ll be quoting that one centuries from now…

 

More info:

Synopsys DesignWare STAR ECC

Lyric Semiconductor

Leave a Reply

featured blogs
Oct 23, 2020
Processing a component onto a PCB used to be fairly straightforward. Through hole products, a single or double row surface mount with a larger center-line rarely offer unique challenges obtaining a proper solder joint. However, as electronics continue to get smaller and conne...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...
Oct 23, 2020
Any suggestions for a 4x4 keypad in which the keys aren'€™t wobbly and you don'€™t have to strike a key dead center for it to make contact?...
Oct 23, 2020
At 11:10am Korean time this morning, Cadence's Elias Fallon delivered one of the keynotes at ISOCC (International System On Chip Conference). It was titled EDA and Machine Learning: The Next Leap... [[ Click on the title to access the full blog on the Cadence Community ...

featured video

Demo: Low-Power Machine Learning Inference with DesignWare ARC EM9D Processor IP

Sponsored by Synopsys

Applications that require sensing on a continuous basis are always on and often battery operated. In this video, the low-power ARC EM9D Processors run a handwriting character recognition neural network graph to infer the letter that is written.

Click here for more information about DesignWare ARC EM9D / EM11D Processors

featured Paper

New package technology improves EMI and thermal performance with smaller solution size

Sponsored by Texas Instruments

Power supply designers have a new tool in their effort to achieve balance between efficiency, size, and thermal performance with DC/DC power modules. The Enhanced HotRod™ QFN package technology from Texas Instruments enables engineers to address design challenges with an easy-to-use footprint that resembles a standard QFN. This new package type combines the advantages of flip-chip-on-lead with the improved thermal performance presented by a large thermal die attach pad (DAP).

Click here to download the whitepaper

Featured Chalk Talk

Embedded Display Applications Innovation

Sponsored by Mouser Electronics and Texas Instruments

DLP technology can add a whole new dimension to your embedded design. If you considered DLP in the past, but were put off by the cost, you need to watch this episode of Chalk Talk where Amelia Dalton chats with Philippe Dollo of Texas Instruments about the DLP LightCrafter 2000 EVM. This new kit makes DLP more accessible and less expensive to design in, and could have a dramatic impact on your next embedded design.

Click here for more information about Texas Instruments DLP2000 Digital Micromirror Device (DMD)