feature article
Subscribe Now

40nm Altera Stratix IV

Bigger and Cooler than we Expected

New process nodes have a predictable rhythm.  Until about 90nm, we knew before anybody announced anything that we’d get double the density, half the power (dynamic, of course), and 50% more speed than we had in the previous generation.  Of course, that made waiting for the announcements from semiconductor companies a little less than suspenseful.  Our Moore’s Law alarm clock would beep on its two-year cycle.  We’d check to see if anybody had announced the thing we were expecting yet, and then we’d hit the one-month snooze button and fade back off into our dazed delirium.

This week, Altera became the first to announce an FPGA family on the 40nm process node, and the results surprised us.  (Editor’s note:  FPGA Journal was actually the first to announce 45nm – see “45nm Chicken,” but Altera outfoxed us by chipping off 5 more nanometers and turning their amp down to “40.”  The result is a future family that surprised us a bit, and it challenges classical definitions of the boundaries of programmable logic.

We didn’t know exactly what to expect at this process node.  Our predict-o-meter lost its punch at about 90nm where at least a modicum of drama crept into the scenario.  We’d watched the supply voltages step down from the 5V to the 1V range.  This meant that the voltage swings were less with each node, and the obligatory dynamic power savings came along pretty much for free.  Even though we were clocking more gates faster, the total power stayed the same or even dropped a bit due to the process technology gains.  While we weren’t paying attention, however, those transistors got leakier as they got smaller. 

It was no big deal at first, but over time we began to see static power consumption due to leakage account for a measurable part of the total power.  At 90nm, this effect officially hit the map.  For programmable logic, it hit hard.  All those configuration transistors that complete the routing and define the LUT functions were not-so-quietly sucking up beaucoup current, raining on our power parade in a big way.  Other types of devices with metal-based fixed interconnect didn’t have this bloat, and therefore they could wait a generation or two before the static power problem hit like a tidal wave.

FPGA companies were working hard to keep static power under control.  They moved to the lowest-power processes offered by their respective fabs, started designing leakage-reducing features into their architectures, and began to compromise on other axes like performance to keep the static dragon at bay.  Through 90nm and 65nm, the results were impressive.  FPGA companies managed not only to keep static power at bay, but actually to make some gains compared with previous nodes.  They had to.  If they doubled the density and the static power per gate stayed static, that component of total power doubled anyway. 

Now, fast forward to the current announcement.

Stratix IV is Big.  With up to 680,000 logic elements (What exactly are these? We’ll get to that in a bit.), 22.4 Mbits of memory, 1360 18X18 multipliers, and 48 multi-gigabit SerDes transceivers, Altera can safely claim the title of “World’s largest FPGA not yet in production.”  OK, there really isn’t a title like that, but the point is – once we get these buggers, they’ll be huge.

What about the power consumption?  Yeah, we knew you were gonna ask that.  Altera claims to have compromised on speed at the transistor level in order to reduce leakage.  These compromises include increasing Vt, increasing channel lengths, thickening gate oxide, and decreasing Vcc.  Next, they worked to gain back enough of that speed via other means to assure that Stratix IV is still faster than its predecessor (65nm Stratix III).  Altera says the net result is that Stratix IV has an average of 30% lower total power consumption compared with similar designs on Stratix III.

Altera rolled out an innovative architecture with Stratix III that’s still around in this generation, which gives design tools the flexibility to trade off performance for power at the individual logic-cell level.  Each cell can be programmed to be high-speed or low-power by programmable back-biasing.  Cells on the critical path can be cranked up to the needed performance, and those off the freeway can throttle back and sip current at a leisurely pace.  The result is a big savings in overall power without a loss of critical path performance.  

With power under control, what happens to speed?  Altera claims “over 600MHz logic performance” – faster than Stratix III, but not the full performance gain we saw back in the “good old days” of Moore’s Law.  However, most FPGA users no longer want a frequency doubling with every node.  FPGAs have long since passed the point where performance is “good enough” for most applications, and other factors like functionality, density, I/O capacity and power consumption have taken center stage.

Functionality-wise, Altera has dumped a boatload of memory, more multipliers than most of us could conceive of using (that’s only those of us that aren’t doing high-performance signal processing applications like video, radar, etc. – those folks will be jumping for joy at the unprecedented 1360 multipliers), and 680,000 of — something. 

OK, here we go.  In the old days, FPGA companies described the density of their devices in “system gates.”  These had absolutely no basis in anything measurable.  It took about a zillion system gates to equal 500K ASIC gates.  As a result, we made fun of them – a lot.  Then, they went to a more realistic measure – the number of 4-input lookup tables (LUTs).  That works, right?  Nope.  Marketing came in and started inflating the LUT counts based on perceived architectural advantages of one LUT structure over another.  Pretty soon, we were talking about “effective logic elements” which was the number of 4-input LUTs times a marketing fudge factor.

Now, at least, we could settle in, right?  Wrong again, Roger.  FPGA companies went to wider logic elements – 6-ish input look up tables.  They couldn’t just suddenly change their units, and there was nothing left on the device to count.  They semi-settled on what we have today, which is a “logic elements” number that’s equal to the number of 6-input LUTs multiplied by a factor deemed appropriate for the conversion from 4-input LUTs, then multiplied by another “our marketing is better than your marketing” factor in order to make it bigger than the other guy’s.

With Stratix IV, all of this fudge-factoring is not really an issue because 680K is enough bigger (a little more than double the 330K Xilinx claims for their current largest Virtex-5 LX device) than anything else on the market that all the marketing factors in the world won’t bridge the gap.  These devices are big enough to handle the demands of a great many high-end ASIC users, which brings us to another important topic – HardCopy.  In the same announcement with Stratix IV, Altera is announcing the matching HardCopy family.  HardCopy takes your FPGA design and converts it directly to an ASIC, saving significantly on unit cost at high volume. 

HardCopy does have an NRE, but it’s an order of magnitude lower than a similar-complexity standard-cell ASIC, and it does take a few weeks to spin your design, but the spin is far faster than a “normal” ASIC.  Taking advantage of the 40nm technology, a lot of the inefficiency of the 1:1 FPGA correlation in the architecture is eliminated when comparing with 65nm or particularly with 90nm ASIC.  In short, Stratix IV to HardCopy is an extremely attractive strategy for getting a high-performance, high-density ASIC design at 40nm.  Unit costs are sill higher than a full-boat ASIC, but by the time you amortize the NRE savings and get your device to market faster, much of that difference is also erased.

Altera says that customers can start designing with the 8.0 release of Quartus II (also announced this week) and can expect engineering samples of the first devices in the fourth quarter of this year.  Volume production will likely commence in phases beginning in 2009, and customer tapeouts for HardCopy IV ASICs will start in Q3 2009.  That gives you just about enough time to get your product up and working with the Stratix IV FPGA version, win some market share, and then go to profit by cost reducing with HardCopy IV.  It’s a nice picture.

Leave a Reply

featured blogs
Sep 19, 2023
What's new with the latest Bluetooth mesh specification? Explore mesh 1.1 features that improve security and network efficiency, reduce power, and more....
Sep 20, 2023
Qualcomm FastConnect Software Suite for XR empowers OEMs with system-level optimizations for truly wireless XR....
Sep 20, 2023
The newest version of Fine Marine offers critical enhancements that improve solver performances and sharpen the C-Wizard's capabilities even further. Check out the highlights: γ-ReθTransition Model and Extension for Crossflow Modeling We have boosted our modeling capabi...
Sep 20, 2023
ESD protection analysis is a critical step in the IC design process; see how our full-chip PrimeESD tool accelerates ESD simulation and violation reporting.The post New Unified Electrostatic Reliability Analysis Solution Has Your Chip Covered appeared first on Chip Design...
Sep 10, 2023
A young girl's autobiography describing growing up alongside the creation of the state of Israel...

Featured Video

Chiplet Architecture Accelerates Delivery of Industry-Leading Intel® FPGA Features and Capabilities

Sponsored by Intel

With each generation, packing millions of transistors onto shrinking dies gets more challenging. But we are continuing to change the game with advanced, targeted FPGAs for your needs. In this video, you’ll discover how Intel®’s chiplet-based approach to FPGAs delivers the latest capabilities faster than ever. Find out how we deliver on the promise of Moore’s law and push the boundaries with future innovations such as pathfinding options for chip-to-chip optical communication, exploring new ways to deliver better AI, and adopting UCIe standards in our next-generation FPGAs.

To learn more about chiplet architecture in Intel FPGA devices visit https://intel.ly/45B65Ij

featured paper

An Automated Method for Adding Resiliency to Mission-Critical SoC Designs

Sponsored by Synopsys

Adding safety measures to SoC designs in the form of radiation-hardened elements or redundancy is essential in making mission-critical applications in the A&D, cloud, automotive, robotics, medical, and IoT industries more resilient against random hardware failures that occur. This paper discusses the automated process of implementing the safety mechanisms/measures (SM) in the design to make them more resilient and analyze their effectiveness from design inception to the final product.

Click here to read more

featured chalk talk

Optimize Performance: RF Solutions from PCB to Antenna
RF is a ubiquitous design element found in a large variety of electronic designs today. In this episode of Chalk Talk, Amelia Dalton and Rahul Rajan from Amphenol RF discuss how you can optimize your RF performance through each step of the signal chain. They examine how you can utilize Amphenol’s RF wide range of connectors including solutions for PCBs, board to board RF connectivity, board to panel and more!
May 25, 2023