feature article
Subscribe Now

Solving the Big Secret

Synopsys Attacks SEUs in FPGAs

A few years ago, one FPGA vendor, Actel, was quietly shouting in the corner. “Hey! Single event upsets (SEUs) are a big problem for FPGAs!”

The other FPGA companies replied with a thoughtful technical analysis of the situation: “Hey, Actel – SHUT UP!” 

OK, maybe that’s not exactly the way it went down, but the idea is basically right. You see, Actel’s history is in super-high-reliability FPGAs for use in space. Up in space, there are lots of tiny particles flying around with a lot of energy. When one of those particles hits a vulnerable part of an IC (like a storage element of some kind), it can flip the bit from one to zero or zero to one. As your razor-sharp digital design mind might be telling you right now, this is really bad.

Now, all digital design technologies contain some forms of memory elements – registers, flip flops, RAM… so you might wonder why this problem is particularly bad for FPGAs. In addition to all those “normal” uses of storage elements, FPGAs also use memory-like cells to store basic configuration information like ROUTING.

Uh oh.

So, in addition to possibly modifying your data, an SEU in an FPGA can randomly alter your design itself. This is very, very bad news. For the regular memory elements like registers, there were already techniques in use that could mitigate the errors. Triple-module redundancy (TMR) for example, uses 3 memory elements to store 1 bit of information, and it has voting circuitry that detects and corrects errors when they occur. If one bit gets hit with an SEU, the other two out-vote it and your design continues without issue. Regular memory, of course, can be protected with established error-correcting code (ECC) techniques. State machines can be protected from bit-flipping in state registers by choosing fault-tolerant encoding schemes.

However, for FPGAs, the vast majority of vulnerable memory cells are used in the configuration logic for things like routing and look-up table programming. The reason Actel was shouting loudly about this is that their two FPGA technologies, antifuse and flash, have configuration elements that are basically not vulnerable to SEUs. When asked about this, the larger FPGA companies – whose FPGAs use SRAM-like cells for configuration – quickly replied “Hey, did you know that our FPGAs have 11.2% more equivalent look-up tables than our competitors’?” 

The big FPGA companies didn’t ignore the problem completely, however. Xilinx, in particular, has product offerings for space, mil/aero, and other high-reliability applications. They even had FPGAs in the Mars rovers. Without much public noise, they were taking their FPGAs to the neutron firing range in New Mexico and blasting them with all kinds of particles to see what shook loose. They were then developing techniques to help mitigate these radiation effects, including a “readback” technique for finding and correcting configuration errors. In this technique, the device’s configuration is regularly read back and compared against a reference and the device is re-programmed any time a discrepancy is found. 

While far from perfect, the radiation mitigation in conventional SRAM FPGAs has proven good enough for space use in many cases. There is a large number of SRAM FPGAs in the radiation-intensive environment of space at this very moment. Xilinx followed with special FPGAs designed specifically for radiation tolerance, so their confidence in their ability to curtail radiation effects in space seems high.

But what about down here on the ground?

On the ground, of course, there is much less radiation. This is a good thing, because if the atmosphere didn’t protect us, we’d be in all kinds of trouble – and not just with our electronic devices. However, even a small change in elevation can dramatically increase susceptibility to radiation effects. It turns out that an elevation of just a few thousand feet – like, say, that of Denver, Colorado – can 4x your chances of radiation-induced errors. Furthermore, it turns out that materials used in chip packaging can emit particles that can cause errors. Yep, your own solder blob can irradiate your device and cause a logic error. Best be careful with that lead-based solder!

Xilinx actually quietly studied this phenomenon in detail. They did a series of experiments starting somewhere around 2005 called “Rosetta” which consisted of putting a bunch of FPGAs on a big board and letting it run continuously for a long time – watching for radiation-induced errors. Based on their results, it looks like the worst generation for firm-error susceptibility in Xilinx FPGAs was the 130nm generation, and that progress has been made since then in mitigating these effects. No data is available yet for the 28nm products, as these tests take a lot of time. 

Without hard work in layout and IC design tricks, each generation of ICs should be more susceptible to SEUs. Lower voltages mean that less energy is required to flip a bit, and increased density means there are more targets to hit in configuration logic. It’s a constant battle between IC design and technology progress to keep SEUs at ground level under control. With each generation, if the fabs and the FPGA designers can’t come up with a breakthrough, your SEU susceptibility could go through the roof without warning. My guess is that you don’t check for SEU tolerance before you design in an FPGA, and if you’re using the latest process generation, the FPGA company may not have any meaningful data to share with you anyway. You’re on your own.

Ahem. Hey Kevin, you buried the headline.

Right! I did. Sorry. If you’re concerned about reliability of systems at ground level, new hope is here from Synopsys. The most recent version (2012.03) of their Synplify Premier FPGA synthesis tool includes robust support for high-reliability design, including SEU mitigation. Up until now, even though there were established techniques for SEU mitigation, it was VERY hard to get them into your FPGA design. If you wanted TMR for a register, for example, you had to design it explicitly in your HDL. Then an overly helpful synthesis tool might come along and say “Hey, look at all these extra gates that don’t do anything for the logic. Let’s optimize them away.” Oops.

Synopsys high-rel support includes three things: First, it does automatic generation of TMR (when you want it) and suppression of the tool’s desire to automatically optimize the TMR away. Second, it will infer error-correcting (ECC) RAM. Finally, it can generate fault-tolerant finite state machines (FSMs) so your design won’t wander off down a dusty path if an SEU hits a state register at an inopportune time. The tool generates Hamming-3 encoded FSMs, which will provide a much safer landing if a state bit happens to flip.

Synopsys is not the first FPGA tool vendor to supply this sort of capability. Mentor Graphics Precision Hi-Rel first rolled out this kind of capability a couple of years ago in a product targeted at the high-rel Mil/Aero market. Synplify Premier is now bringing these capabilities to the broader market with Synplify Premier.

There is a cost to all this safety, of course. TMR, for example, is a very gate-intensive solution. It more than triples the amount of logic for a register, so you don’t want to go using it unless you have a serious concern about SEUs in your design. For the record, if you’re designing anything that goes in an airplane in which I’m flying, I vote for you using TMR, and the other high-rel features too. Thanks.

8 thoughts on “Solving the Big Secret”

  1. FPGA companies don’t talk too much about SEUs, and with good reason. They don’t happen very often, but when they do they’re kinda scary. Luckily, there are now design tools that can help us mitigate the effects of radiation in our FPGA designs.

    What do you think?

  2. SEU don’t happen very often. Yes, may be, but most space companies in Russia still use Virtex-4 and sometimes Virtex-2(!!!).
    I have no chances to bring new Zynq family with ARM instead Virtex-4 and PPC.
    Reason? SEU!!!

  3. Pingback: Safety
  4. Pingback: DMPK
  5. Pingback: orospu
  6. Pingback: uodiyala

Leave a Reply

featured blogs
May 18, 2022
Learn how award-winning ARC processor IP powers automotive functional safety tech, from automotive sensors to embedded vision systems, alongside AI algorithms. The post Award-Winning Processors Drive Greater Intelligence and Safety into Autonomous Automotive Systems appeared...
May 18, 2022
The Virtuoso Education Kit has just been released and now there is already a new kit available: The Organic Printed Electronics PDK Education Kit ! This kit also uses Virtuoso as the main Cadence... ...
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

Powering Servers and AI with Ultra-Efficient IPOL Voltage Regulators

Sponsored by Infineon

For today’s networking, telecom, server, and enterprise storage applications, power efficiency and power density are crucial components to the success of their power management. In this episode of Chalk Talk, Amelia Dalton and Dr. Davood Yazdani from Infineon chat about the details of Infineon’s ultra-efficient integrated point of load voltage regulators. Davood and Amelia take a closer look at the operation of these integrated point of load voltage regulators and why using the Infineon OptiMOS 5 FETs combined with the Infineon Fast Constant On Time controller engine make them a great solution for your next design.

Click here for more information about Integrated POL Voltage Regulators