April 22, 2014

Toward Ten TeraFLOPS

Altera Kicks Up Floating Point

by Kevin Morris

The Cray-2, the world’s fastest computer until about 1990, was capable of almost 2 GigaFLOPS (Billion Floating Point Operations per Second) – at an inflation-adjusted price of over $30 million. A decade later, ASCI Red – selling for a cool $70 million or so – topped one teraFLOPS (Trillion Floating Point Operations per Second). The machine was twice as expensive, but the price per performance had dropped from ~$15M/GFLOPS (Cray) to ~$70K/GFLOPS (ASCI Red). That’s a shocking improvement. Moore’s Law would have us believe in a ~32x gain over the course of a decade, but real-world supercomputers delivered over 200x in just ten years. Take that, Dr Moore!

Sometime in 2015, according to Altera, we will have a single FPGA (yep, that’s right, one chip) – designed by Altera and manufactured by Intel – capable of approximately TEN teraFLOPS. Let’s do some math on that, shall we? We don’t know exactly what a Stratix 10 FPGA will cost, but it almost doesn’t matter. This device should put us in the realm of $1/GFLOPS. Or, compared to ASCI Red, an additional 70,000x improvement in cost per performance. Compared to the 1990’s Cray-2 (a quarter century earlier), That’s a 15,000,000x improvement – in a time span when an optimistic interpretation of Moore’s Law says we should have less than 10,000x improvement. This is all very fuzzy math, but it appears that high-performance computing will have outpaced Moore’s Law by some 1,500x since 1990.

Whoa!

Now, before you all pull out your slide-rules and start shouting about everything from our underlying assumptions to Altera’s marketing “techniques,” let’s see what’s changed to make that possible. We all know that the underlying technology – semiconductors – have tracked pretty straight with Moore’s Law (if you live in a bizarre logarithmic land where you count a 50-year exponential as “straight”). That means our computing hardware has made some serious gains in places other than the number of transistors packed onto a single die.

What kinds of engineering innovation give us this extra three orders of magnitude of “goodness”? The case we’re examining – the most recent innovation announced just this week – is Altera’s hardening of their floating point arithmetic units. IEEE 754 Single Precision Floating Point is now fully supported in optimized hardware – in the DSP blocks of both the current Arria 10 and in the upcoming Stratix 10 FPGAs. This brings a major performance boost to floating point applications targeting FPGAs.

Hey there Horace, haven’t hardware multipliers been around for at least three decades?

Yes they have. Even back in the 1980s, the venerable 8086 shamelessly rode the coattails of its lesser-known but harder-working sibling, the 8087, to floating point fame and fortune. What Altera has done, however, is to combine the fine-grained massively-parallel capabilities of a modern FPGA with a very large number of floating-point-capable DSP blocks. While FPGAs have been routing von Neumann processors for years on fixed-point datapath throughput, their supercomputing achilles heel was always their floating point architecture (or, more precisely, their lack thereof).

Modern FPGAs contain sometimes thousands of DSP units. You can construct a massively parallel datapath/controller architecture using the FPGA fabric that can significantly outperform even the fastest DSP processors in big math-crunching algorithms. Even more significant is the extreme power savings of an FPGA-based implementation compared with a software solution executed by conventional processors. Numerous benchmarks have demonstrated the superiority of FPGAs compared to DSPs, conventional processors, and even GPUs for datapath-oriented computing – both in raw performance and in computational power efficiency.

However, there have always been two major barriers to the adoption of FPGAs for high-performance computing. First is the difficulty of programming. Where a conventional processor or a DSP requires software expertise in a high-level language like C++ (or, even FORTRAN, believe it or not, for some high-performance computing projects), FPGAs have always required a background in digital hardware design and fluency in a hardware description language such as VHDL or Verilog. This means that getting your algorithm running on an FPGA has historically required adding a hard-to-find hardware/FPGA guru to your team and a few months to your schedule, and those are two luxuries that many teams do not have.

Altera’s solution to the programming challenge is an elegant one. Since the emergence of GPUs as high-performance computing platforms and the explosion of languages like Nvidia’s CUDA or the Apple-developed (but now open) OpenCL, software engineers have been moving closer to the task of defining explicit parallelism in their code. Altera met those OpenCL programmers more than halfway by providing a design flow that maps OpenCL directly to hardware on Altera FPGAs. If you’re already writing OpenCL implementations of your algorithm to run on GPUs, you can take that same code and target it to FPGAs – with reportedly outstanding results.

The caveat on that OpenCL flow (until now) has been floating-point math. Since the DSP blocks on FPGAs have always been fixed-point, floating point arithmetic required going outside the DSP blocks and implementing the logic in FPGA LUT fabric. While this was still a “hardware” implementation, it was much less power- and logic-efficient than a custom-designed hardware floating-point unit. With this announcement, Altera has plugged that gap – bringing fully-optimized hardened single-precision floating point to their DSP blocks.

Apparently, these nifty hardened floating point units have already been hiding in Altera’s Arria 10 FPGAs – just waiting for support in the design tools. Now, when design tool support is turned on, Altera’s 20nm, TSMC-fabbed Arria 10 FPGAs will suddenly be capable of up to 1,500 GFLOPS. This performance can be tapped via the OpenCL flow, the DSPBuilder flow, or even old-school with “FP Megafunctions” instantiated in your HDL code.

Where this gets really interesting, however, is with Altera’s upcoming Stratix 10 family – based on Intel’s 14nm Tri-Gate (FinFET) process. With Stratix 10, Altera claims they’ll have up to ten teraFLOPS performance in a single FPGA. That’s staggering by any standard, and we should have it sometime in 2015.

It is perhaps appropriate at this point to debunk some of the derisive rumors being manufactured and spread by one of the industry’s less-reputable pay-for-play shill blogs. There is absolutely no evidence to support rumors of Altera leaving Intel and going back to TSMC for Stratix 10. On the contrary, at this moment, Altera has working test chips in house that were fabricated with Intel’s 14nm Tri-Gate process. Altera is using these test chips to validate high-speed transceivers, digital logic, and hard-IP blocks (perhaps, even hardened floating-point DSP blocks, although the company hasn’t shared that specifically). Now, maybe this is all innocent and the bloggers in question were simply “confused” because Altera is still very actively partnering with TSMC as well – on the aforementioned 20nm Arria 10 line. Or, perhaps, Altera and Intel didn’t pony up for protection from the blogger mob, so they got kneecapped with some vicious and baseless rumors. As of this writing, however, Altera and Intel are still working hard together on Stratix 10 with 14nm Tri-Gate technology – and apparently it is coming along quite nicely.

Hardening the floating point processing has the obvious advantages one would expect, plus some less-obvious ones. Of course, optimized floating-point hardware is much faster than floating-point processors built from FPGA LUT fabric. Also of course, power consumption is greatly reduced. Less obvious is the fact that, since Altera has just freed up all those FPGA logic cells that were doing floating point before (a great number of them, it turns out), we are suddenly gifted with a huge helping of extra FPGA fabric. In other words, if you were using your old FPGA for floating point, that FPGA just got a whole lot bigger.

Following onto that advantage, the old floating-point modules were some of the most difficult parts of many designs to successfully route and bring to timing closure. Now, with these hardened floating point blocks, those routes no longer need to be routed and those paths no longer need to suffer the agony of timing closure. Your design tool drama and runtimes just took a big turn in the right direction.

There is an industry significance to this announcement that is also not obvious. For decades now, FPGA companies have dueled it out for their slices of the lucrative communications infrastructure pie. While that market has always been the leading revenue generator for FPGAs, the technology is clearly applicable in many other markets and application areas. However, the requirement to have an FPGA expert on the team has thrown a wet blanket on many of those new-market opportunities. High-performance computing is clearly one of those under-served, high-potential applications for FPGAs. If FPGAs can get past a critical proof-point, a whole new market opens up. When a software engineer can write code in a programming language like OpenCL, target that code to an FPGA with equal ease to targeting that same code to something like a GPU, and get some combination of faster performance, lower cost, and lower power consumption, then we have reached our proof-point, FPGAs have a new market, and Altera is then competing with companies like Nvidia rather than their traditional rivals.

You can get started designing now with Arria 10 using any of Altera’s supported design flows. Today, your code will map to soft-core floating-point units implemented in the FPGA fabric. In the second half of this year, when Altera turns on hardened floating point support, your same design should automatically re-map to take advantage of the new hardware. Then, when Stratix 10 comes out next year, you’ll be ready to really turn up the boost. Altera says they have pin-compatible versions of Arria 10 and Stratix 10, so that migration step should be pretty seamless as well.

21 thoughts on “Toward Ten TeraFLOPS”

kevin says:

April 22, 2014 at 10:52 am

Altera has added hardened floating-point to their DSP blocks in both the current Arria 10 and upcoming Stratix 10 FPGAs and SoCs. They claim that brings a Stratix 10 device up to 10 teraFLOPS territory. Do you think this will finally break FPGAs into HPC in a major way?

Log in to Reply
TotallyLost says:

April 22, 2014 at 11:18 am

It certainly will, especially if they fine tune C and System C synthesis to include OpenMP optimizations around the DSP blocks and Memory blocks. If the application is developed on a conventional processor with a coding style targeting FPGA C synthesis, then real application profile data becomes a valuable input to OpenMP synthesis to allocate/prioritize logic to balance and improve overall performance.

The really cool part of using FPGA’s for HPC applications is the distributed memory blocks that FPGA’s offer. In conventional processors the memory channel is often the dominate performance bottleneck, which certainly isn’t breaking Moore’s Law at the same pace. Having many independent memories can scale system level performance significantly.

On a completely different approach, doing pipelined bit serial floating point can solve some problems significantly better than even floating point DSP blocks … and frequently at a lower power and higher computational density without needing a lot of memory for intermediate terms. What would be really cool, would be to provide a huge number of floating point bit serial DSP blocks in an FPGA.

The downside to using FPGA’s for HPC is there is some significant sensitivity to Single Event Upsets, which Altera has made significant steps in mitigating with their dynamic configuration ram scrubbing. Although other static memory cells in the FPGA are still at risk of data corruption. While for a single FPGA the failure rates are relatively small, once you start putting several thousand of these into a system, system level failure rates drop to days/weeks, especially above sea level in places like Colorado or Sandia National Labs. However, a very large aquarium as a system enclosure can provide a very nice visual presentation for the system, along with necessary shielding and thermal sink for unintended shutdowns of cooling.

Log in to Reply
jjussel says:

April 22, 2014 at 12:07 pm

This is good news and has been a long time coming. Having viable floating point processing on FPGA will open up new markets for HPC applications. To really compete in those markets Altera will need to increase the domain-specific content in their OpenCL programming flow. But most importantly, there needs to be an economic driver to get FPGAs into computing hardware. In that respect, Altera is competing with GPUs which driven by gaming and video are already inside the hardware. That pre-existing volume gives GPUs an order of magnitude price advantage over the big FPGA devices. Once Altera (and Intel) find an application with the necessary volume, then every Cloud server will include FPGAs to deliver super-computing processing speeds to the masses. It will be fun to see what applications pop up then!

Log in to Reply
eorenstain says:

April 23, 2014 at 8:01 am

If this article is supposed to describe the mid to far future then in my mind it fulfills it’s target.
This article DOESN’T describe the present. In the present it isn’t hard to find a hardware engineer which can write the code needed. It is though very hard to write an optimized code for the Stratix in OpenCL. You have to use the libraries from Altera and the programmer has to optimize the code (here is the fun part)by understanding the structure of the Stratix on one hand and obeying the optimization instructions from Altera on the other hand. Good Luck finding those programmers. I’ve seen a demo given by Altera when they introduced their new family of devices, in this demo they wanted to show that using OpenCL you get better results utilizing the devices then using HDL – you get better results but the programming isn’t trivial. I agree that the future (not the near future) will be using a high level language to program FPGA. This future isn’t around the corner, unlike the opinion expressed in the article.

Log in to Reply
70billy says:

April 23, 2014 at 9:39 am

Is there some more detailed info on architecture and interface of such blocks?

Shall we take a conservative factor of 1 over 100 or more in real use? so to say that Altera claims 10 TF/s but in real application maybe 10-50 GF/s, if embedded processors must be mapped.

The real question is how to handle TF/s and I/O.

Next steps will be to embed ADC’s, then integer 2 float conversions to arrive to … a FP multicore ASIC-like!

Log in to Reply
kevin says:

April 23, 2014 at 12:21 pm

@TotallyLost, I agree. I should have gone into some detail on the memory bandwidth issue in the article. Memory access is not only a dominant performance bottleneck, it is a dominant factor in power consumption as well. For many HPC applications, power is the ultimate limitation. You can always stack more processors in a rack/room – until you can’t get the heat out anymore.

I don’t agree that SEUs are a major issue with current FPGAs. Yes, there is a finite risk of an errant neutron flipping a bit, but any system with any non-error-correcting storage element has that risk (although not at the same probability of the configuration logic in an FPGA). Also, I don’t think any practical amount of shielding (even thought the aquarium idea sounds cool) can mitigate the SEU risk. Design techniques like TMR, safe state machines, etc – can mitigate the risk in FPGA-based systems that are highly vulnerable (in orbit, for example).

@jjussel, Also agreed, and it would be intriguing to see Intel server blades with FPGAs on board as compute accelerators.

@eorenstain, It is difficult to write “optimized” code for anything. Trying to optimize OpenCL for a GPU also requires knowledge of the target hardware, and goes well beyond just describing your algorithm in a high-level language. In fact, I’ll wager that the more “optimized” the code is for a particular GPU, the worse it will perform mapped to an FPGA. Any time you have to explicitly specify parallelism in your programming language, you are faced with the task of scheduling and resource allocation – for a known, fixed set of resources (number and type of processors in a GPU, for example). The good thing about an FPGA is that the architecture can be altered to adapt to the software, rather than the software needing to be adapted to fixed hardware.

Regarding the future/present nature of the technology – there are already a number of different flows that can produce excellent results using FPGAs in this manner – custom RTL (of course), model-based design starting from tools like Matlab and Simulink, high-level synthesis from C/C++/SystemC, and mapped languages like OpenCL. All of these approaches have strengths and weaknesses, and most of them require some knowledge of the underlying hardware and of hardware design in general.

@70billy, Yes, there is more detailed info on the architecture available. I’ll see what I can come up with.

Log in to Reply
TotallyLost says:

April 23, 2014 at 1:09 pm

@Kevin,

The big mistake is to assume you really want to cool a multi-megawatt system with air. At some point the energy cost to cool/move the air, is significantly more than the electronics itself — reflected not only in direct energy costs, but increased building size as well. Add to that repair costs (labor + parts) to maintain many hundreds/thousands of fans.

Reducing the core computational engine, and memories, into a high density stack with active liquid cooling, also significantly reduces interconnect latencies. Done well, this means the core HPC system is only a few cubic meters. I’ve proposed development of such systems before, where backplane gigabit ethernet is the primary interconnect, rather than using copper/fiber cables with traditional air cooled blade designs.

High energy neutrons can be easily shielded with a meter or two of water … thus a computational core that is a few cubic meters in size, is easily shielded with a several thousand gal aquarium, with the computational core at the center. A practical extension of the Cray-1 physical design.

We can “agree to disagree” about the impacts of both SEU and SET errors, with very large, high speed systems that have significant risk points besides the configuration memories. When major simulation models take weeks/months to complete, having to run the model multiple times to validate results are free of SEU/SET errors can be expensive. Especially during periods of high energy solar flares.

Log in to Reply
beercandyman says:

April 25, 2014 at 1:13 pm

@70Billy FPGAs have the ability to implement the data flow graph directly in hardware. This means that the 10 Teraflops for the Stratix 10 is very usable. Currently the top FPGAs without hardened floating point are rated at 500 GF and they get 350 GF. The 10TF you hears about is only from the DSP units there are millions of logic blocks that could be add to that so I suspect that you will get 10TF if the system you put around the FPGA can deliver the data.

@eorenstain The Altera OpenCL compiler also lets you add Verilog HDL as a kind of “assembly” although I find I have not needed to do this yet. There is a new flow for the OpenCL complier that will surprise people in Altera’s upcoming 14.0 release.

The Stratix 10 will enable some designs to run at up 1GHz. That together with the floating point and OpenCL brings the future into the very near future and the present (with Arria 10).

Log in to Reply
ajjcoppola says:

April 25, 2014 at 4:28 pm

One key subtle point, but absolutely necessary to get performance on applications, is the need for the hard DSP to support the floating point Multiply-Accumulate (MAC) unit…as in how many times fast can you say “dot-product”…
The diagram I’ve seen in another article seems to indicate that is the case…hopefully so!

The other subtle point is integrating the mapping and scheduling into the software stack in a way that allows all those pesky libraries to just-be-there for performance and usability, as they are for other device families.

Log in to Reply
beercandyman says:

April 29, 2014 at 11:19 am

It’s all about the dot. We have a low latency recursive mode embedded in the circuitry that does the job. Just like the old DPS blocks the new floating point DSP can pass results from one DSP to the next and this makes floating point DSP blocks run very fast on dot products.

The high end frequency for the new DSP blocks are faster than the Stratix 5, the current generation of FPGAs, DSP blocks even while doing floating point MACs and dot products.

The DSP builder tool makes it easy to use these features for the hardware programmer and there will be direct support in the OpenCL compiler so you won’t ever need worry about these features, they will just get used.

Log in to Reply
TotallyLost says:

April 29, 2014 at 2:09 pm

@beercandyman

The HPC uses for this new architecture look awesome.

The OpenCL optimization for this architecure is totally awesome. Any chance the same optimization effort can be applied to OpenMP C-based design entry too so more applications/algorithms are easily portable to this architecture?

M20K memories have ECC support in wide mode for adjacent bit flips, which should handle some/many SEU/SET data corruptions in these memories. Memory constructed from MLAB’s appear at risk. Configuration memory driving mux functions appears protected.

Any comments about SEU/SET error rates on other logic/memory cells in your Stratix 10 family during peak solar flares? Has there been any significant SET testing?

Using high performance serial HMC appears to consume the fast serdes resources pretty quickly, while limiting HPC chip to chip bisection bandwidths at the same time. Any comments about this contention for HPC uses?

Log in to Reply
beercandyman says:

April 29, 2014 at 5:39 pm

When it comes to SEU type errors we have done some work but quite frankly TMR is the only way to be sure. Banks currently have three or four processors work on the same problem and vote. In a normal processor they have ECC ram but no one has an ECC multiplier or ALU. There are transistors all over the place that can flip at any time. So in FPGAs we are working on mitigation and constant checking of our soft spots (so to speak).

I don’t think that HMC will be used for all memory. It has a place in a system architecture along with DDR and QDR memories. The good thing about FPGAs is they can adopt new memory architectures very quickly. OpenCL can address heterogeneous memory subsystems and you can tell the compiler to use certain memories over others (if they are in your system).

I think it’s totally possible to have a C + OpenMP + hardware MPI compiler. We currently do OpenCL tasks which are basically C routines (NDrange(1,1,1)). So the capability is there but I don’t think it’s currently on the roadmap.

Log in to Reply
TotallyLost says:

May 20, 2014 at 6:22 am

It’s much more of a problem than many people are aware of, as SEU/SET failures have a lot of wierd symptoms that are not readily obvious unless you are actually looking for them.

After the C8.3 flare on 5/14/14 last week I saw several of my customers Canopy radio’s either lock up or watchdog reboot, which is actually pretty common each year with a couple hundred deployed in the Colorado Mountains, and easy to monitor with the bigger solar flares.

http://www.tesis.lebedev.ru/en/sun_flares.html?m=5&d=15&y=2014……

We also took two hard drive “failures” from this flare, out of about 70 that were spinning at the time – one in a NAS RAID 5 array with 350+ sectors that instantly went “bad”, the other in a mail server with 1200+ sectors that instantly went bad. Both drives actually had the positioner servo go active while the write enable gate was set on a head, causing a regularly spaced arc of corrupted sectors between the start and ending cylinders traversed by the errant positioner motion. This is obvious when you look at the cyl range and corrupted sector spacing.

After clearing the remapped sector tables, and retesting the drives, both are actually fine, and none of corrupted sectors actually have media flaws.

For Linux systems, after fsck -c collects all the corrupted sectors into the bad block list inode, and resolves duplicates by reallocation, the sectors involved become obvious. In the mail server that was luckily almost entirely contained in the journal log in this case. But what really tells the tale is the uniform cyl to cyl corrupted sector spacings that clearly identifies a heads write enable was briefly latched on for a bit while the positioner was active.

Some might find this fun, and interesting. It’s really worth doing your homework here, when not so random failures occur in a few hours after a major solar flare.

John

————— FYI ———————

Running additional passes to resolve blocks claimed by more than one inode…
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 8: 2076 2080 2107 2111 2140 2173 2204 2238 2304 2340 2373 2389 2392 2398 2400 2432 2447 2459 2511 2529 2537 2672 2717 2723 2757 2767 2776 2783 2804 2848 2968 3019 3082 3096 3143 3169 3233 3544 3550 3552 3560 3563 3591 3682 3723 3728 3827 3953 3976 4096 4123 4147 4209 4216 4218 4234 4256 4260 4271 4303 4316 4319 4338 4368 4373 4407 4408 4440 4469 4492 4510 4512 4517 4523 4526 4528 4529 4551 4553 4569 4602 4607 4620 4632 4653 4703 4706 4715 4743 4754 4757 4760 4774 4776 4789 4793 4850 4869 4881 4886 4952 4957 5007 5016 5022 5024 5055 5064 5073 5152 5167 5188 5190 5192 5210 5256 5277 5286 5288 5329 5369 5375 5383 5401 5420 5439 5441 5463 5488 5495 5518 5520 5536 5545 5555 5564 5622 5624 5643 5661 5670 5672 5707 5728 5738 5771 5775 5787 5802 5864 5896 5902 5904 5909 5912 5914 5921 5982 5984 6030 6032 6044 6055 6068 6075 6092 6113 6187 6217 6235 6265 6280 6302 6304 6314 6315 6316 6322 6357 6375 6378 6387 6397 6399 6403 6412 6418 6427 6442 6455 6476 6500 6548 6552 6569 6644 6660 6694 6696 6707 6775 6811 6838 6839 6853 6863 6882 6918 6920 6976 7019 7106 7107 7115 7133 7140 7158 7160 7208 7232 7253 7264 7280 7281 7324 7343 7350 7352 7355 7402 7417 7419 7438 7440 7459 7525 7547 7611 7696 7698 7736 7738 7755 7835 7859 7872 7887 7897 7929 7958 7960 8023 8108 8166 8168 8178 8179 8312 8374 8376 8405 8410 8417 8419 8484 8508 8565 8584 8600 8688 8694 9784 9810 9872 9892 9917 9918 9920 9971 9987 10004 10008 10042 10045 10048 10049 10051 10067 10072 10110 10112 10147 10192 10206 10208 10213 10255 10270 10272 10301 10347 10353 10362 10395 10407 10413 10416 10435 10516 10540 10643 10682 10691 10718 10720 10744 10759 10767 10791 10796 10802 10825 10842 10851 10857 10864 10879 10880 10894 10896 10931 10943 10946 10983 11007 11045 11066 11096 11103 11126 11128 11165 11197 11207 11227 11260 11267 11278 11280 11313 11319 11328 11341 11348 11371 11380 11394 11402 11403 11407 11415 11424 11434 11469 11480 11541 11556 11561 11566 11568 11604 11613 11620 11623 11626 11630 11632 11635 11669 11681 11686 11688 11693 11716 11720 11728 11730 11752 11757 11761 11777 11798 11800 11805 11811 11817 11823 11832 11839 11844 11851 11870 11872 11886 11888 11902 11904 11907 11913 11924 11931 11933 11936 11938 11953 11968 11977 11982 11984 11991 12048 12066 12114 12118 12120 12134 12136 12140 12144 12150 12152 12158 12160 12167 12178 12184 12210 12242 12248 12262 12264 12277 12286 12288 12307 12310 12312 12333 12337 12352 12385 12422 12424 12460 12466 12473 12479 12488 12519 12541 12563 12744 12763 12944 12970 13014 13015 13083 13149 13217 13350 13480 13520 14664 14676 14800 14854 14920 14982 14984 14986 15112 15151 15274 15277 16488 16527 16570 16888 16935 17012 17048 17168 17190 17256 17268 17374 17440 17442 17504 17519 17572 17575 17590 17592 17632 17752 17787 17831 17832 17849 17911 17939 17946 17948 17973 17984 18069 18090 18121 18165 18251 18317 18321 18329 18332 18359 18495 18498 18522 18531 18534 18536 18542 18544 18577 18669 18677 18697 18707 18730 18734 18736 18745 18766 18768 18781 18790 18792 18817 18833 18837 18845 18860 18888 18924 18927 18961 18986 19001 19006 19008 19034 19052 19059 19089 19096 19159 19219 19246 19248 19276 19277 19307 19359 19445 19573 19586 19712 19723 19751 19765 19784 19904 19931 20086 20088 20144 20168 20204 20206 20208 20233 20236 20241 20245 20253 20259 20276 20319 20454 20520 20544 20564 20576 20588 20667 20670 20672 20682 20734 20736 20761 20765 20776 20791 20798 20800 20819 20821 20830 20832 20858 20872 20884 21072 21096 21105 21129 21147 21183 21202 21209 21241 21424 21446 21448 21474 21475 21597 21629 21653 21694 21696 21728 21749 21783 21809 21840 21885 21954 21957 21960 22044 22086 22088 22101 22124 22192 22217 22229 22254 22256 22286 22288 22304 22338 22375 22390 22456 22501 22548 22554 22603 22758 22760 22785 24248 24296 24299 24315 24318 24320 24327 24348 24351 24406 24408 24412 24415 24418 24434 24449 24475 24485 24494 24496 24503 24512 24537 24546 24574 24576 24583 24607 24643 24667 24682 24688 24704 24710 24712 24713 24723 24732 24750 24752 24753 24775 24842 24858 24886 24888 24940 24982 24984 24995 25015 25051 25078 25080 25093 25118 25120 25136 25151 25200 25209 25221 25239 25242 25274 25280 25334 25336 25376 25383 25409 25456 25474 25536 25576 25611 25631 25653 25675 25726 25728 25753 25793 25827 25845 25878 25880 25898 25913 25935 25944 25950 25952 26003 26015 26024 26027 26054 26056 26063 26065 26082 26101 26116 26160 26176 26217 26328 26337 26358 26360 26422 26424 26444 26452 26466 26528 26536 26583 26721 26752 26774 26776 26788 26819 26904 26940 26999 27050 27059 27095 27122 27248 27250 27271 27442 27504 27550 27552 27602 27627 27663 27696 27804 27816 27896 27918 27920 27951 27992 28106 28112 28139 28181 28560 28744 28777 28815 28992 28996 29184 29186 29312 29362 29410 29428 29482 29509 29607 29667 29848 29907 29953 29958 29960 29997 30006 30072 30101 30122 30159 30167 30192 30206 30208 30236 30261 30292 30305 30320 30329 30335 30345 30353 30359 30409 30448 30466 30485 30490 30500 30518 30520 30548 30554 30590 30592 30610 30613 30655 30684 30724 30738 30786 30804 30813 30853 30859 30880 30929 30953 30966 30968 30975 30976 30982 30984 30987 30993 30999 31037 31079 31106 31143 31150 31216 31219 31230 31232 31234 31268 31298 31320 31356 31373 31397 31418 31430 31496 31522 31561 31596 31630 31632 31663 31816 31936 31996 32030 32096 32111 32140 32147 32158 32160 32190 32192 32195 32214 32216 32233 32242 32279 32283 32301 32313 32406 32472 32485 32557 32597 32599 32696 34864 34873 34952 35037 35116 35240 35274 35295 35296 35326 35328 35332 35584 35615 35694 35824 35854 35920 35969 35975 35980 35990 36120 36171 36206 36208 36221 36536 36554 36596 36606 36608 36615 36624 36636 36637 36645 36662 36664 36674 36700 36703 36765 36780 36788 36802 36858 36882

Log in to Reply
kevin says:

May 20, 2014 at 8:57 am

@TotallyLost

That sounds like some very impressive failure analysis right there. Fascinating!

In FPGA-based systems, an SEU failure could literally look like anything – since the routing and/or logic functions themselves could be randomly altered. It seems like that would make such failures nearly impossible to diagnose.

Log in to Reply
Pingback: pax 3 extended mouthpiece
Pingback: GVK Biosciences
Pingback: pezevenk
Pingback: indica
Pingback: friv
Pingback: Corporate Event Management Company
Pingback: backyard swimming pool designs

Quickly and accurately identify inter-domain leakage issues in IC designs

Sponsored by Siemens Digital Industries Software

Power domain leakage is a major IC reliability issue, often missed by traditional tools. This white paper describes challenges of identifying leakage, types of false results, and presents Siemens EDA’s Insight Analyzer. The tool proactively finds true leakage paths, filters out false positives, and helps circuit designers quickly fix risks—enabling more robust, reliable chip designs. With detailed, context-aware analysis, designers save time and improve silicon quality.

Click to read more

featured chalk talk

What’s Driving Zephyr’s Momentum

Sponsored by Mouser Electronics and NXP Semiconductors

In this episode of Chalk Talk, Brendon Slade from NXP and Amelia Dalton explore what Zephyr makes unique, how it compares to other RTOS options, and how its design philosophy enables developers to scale from simple prototypes to production-ready systems with confidence.

Click here for more information about NXP Semiconductors Zephyr™ OS for Edge Connected Devices

May 4, 2026

10 views

Toward Ten TeraFLOPS

Related

21 thoughts on “Toward Ten TeraFLOPS”

Leave a Reply Cancel reply

featured paper

Quickly and accurately identify inter-domain leakage issues in IC designs

featured chalk talk