Swimming Upstream – Fast

When it comes to FPGAs and power consumption, just about all the forces are working against us. No matter how much we want to be “green,” or how much we want to have longer battery life, or smaller heat sinks, or fewer cooling fans, or faster clock speeds, or less weight, or…, power consumption is pulling us the opposite direction. In processing – power is already the supervillain that killed the monolithic processor and forced us into the compromise of multi-core. In FPGA, we haven’t yet given up and are just now in the heat of the struggle.

Power, as we have discussed many times, can be split into two buckets – dynamic and static. Dynamic power is the power we use doing actual “work” – like moving the packets that belong to the skateboarding bulldog video to a different place from the packets that belong to the social network post lamenting the fact that the scalpers were over-charging for that concert last night. Static power is the power we use just sitting there – doing nothing useful or productive at all.

Dynamic power (in our normal digital CMOS world) depends on how much switching we do. The more transistors we toggle, and the faster we toggle them, the more dynamic power we burn. Unfortunately, Moore’s Law continually drives us to do more with more. We have more transistors, and we toggle them faster with each generation. Unfortunately, as we have more transistors available, we are also sometimes more lax in our design style – leading to less efficient design for the same functionality. If we double the transistors every couple of years, use them less efficiently, AND run them twice as fast, we’re just asking for power troubles. We’re also putting them closer together, which makes dissipating the heat more difficult. The one tiny sliver of bright light here (and it’s an LED, so power consumption is low) is that smaller geometries can be operated at lower voltages, which helps to reduce dynamic power somewhat.

Although dynamic power has always been out there wreaking havoc, static power has gotten most of the attention in recent generations with FPGAs. As we make our transistors smaller, they tend to leak more current when they’re sitting still. Since we’re doubling the number of transistors every generation and each transistor leaks more, leakage has been following an exponential curve just like Moore’s Law – only with a much less pleasing outcome. FPGAs are doubly vulnerable, because the majority of FPGA transistors are used for configuration, and therefore most of the time they ARE just sitting there – leaking current. At about 90nm (Remember back then? When you had to carry your development board 9 miles through six feet of snow?) we reached the point where static power and leakage current could no longer be ignored. They had earned a spot right alongside dynamic power in causing us design heartburn.

Now, we have survived from 90 to 65 to 45 and we’re talking about 28nm. Power issues have gotten tougher each generation, but the FPGA vendors have stepped up their game each time with a potent cocktail of process choice, architectural improvement, and design tool innovation. The net result is that today’s biggest FPGAs running at full speed are much more power-efficient than those from previous generations, despite the seemingly overwhelming forces pushing the other way.

Recently, in their ISE Design Suite 12 announcement, Xilinx rolled out another big victory for the coulomb counters. The headline feature in their release is clock-gating technology that the company claims can reduce power “up to 30%”. While that may sound like a lot of marketing speak, the reality behind the claims is strong.

Clock gating, as you probably know, is really just what your mom always told you. “Turn the lights out when you leave the room.”

When we’ve got a huge circuit – often comprised of over a billion transistors – the fact is that most of them spend most of their time not doing anything useful. If we can stop them from toggling needlessly – from marking time during those intervals when they’re not used – we can save a lot of power.

The unusual thing about Xilinx’s clock gating is the granularity. In most strategies, large blocks of the circuit are turned off or put in an idle state when they’re not in use. Xilinx, instead, looks at logic paths and transition logic, determines sequential elements whose transitions don’t affect downstream logic, and disables the clocks on the elements that won’t need to change. This approach suggests that Xilinx will be able to shut down more of the circuit more of the time and see a net improvement in power reduction as a result.

In addition to the power optimization algorithms, Xilinx has improved their process for partial reconfiguration, making it so simple that even a rocket scientist could do it. OK, that’s not fair – a rocket scientist probably couldn’t do it, but a reasonably capable FPGA-savvy designer could, which is a big improvement over previous generations of partial reconfiguration flows. Xilinx has brought the PlanAhead tool more front-and-center in the partial reconfiguration process and contained the complexity so that doing a partial reconfiguration isn’t that much different from doing incremental design work.

Partial reconfiguration has become a more heated topic recently, with even long-time skeptics Altera finally coming around and embracing the philosophy. While partial reconfiguration is no panacea, it can stretch the effective density of your FPGA – enabling a significant class of applications where modal operation requires large sections of task-specific logic.

Xilinx also claims the usual but remarkable set of performance improvements in version 12, including 2x faster logic synthesis and 1.2x faster implementation (place-and-route) runtimes. While we’ve come to expect these improvements like spoiled children, it requires remarkable effort to continue to make significant gains optimizing the performance of complex algorithms like synthesis and layout.

Finally, Xilinx is announcing the upcoming support for the new AMBA 4 AXI4 interconnect protocol. As they pre-announced last year, Xilinx has been partnering with ARM to develop the next AMBA standard, and FPGA footprints are all over it. In addition to the “normal” AXI4, we’ll have AXI4-Lite and AXI4-Stream flavors that should bring us a lot closer to a standardized instantiate-and-play interface for most FPGA-ready IP.

Combined with the company’s announcement of a future ARM/FPGA flexible computing platform, the partnership between Xilinx and ARM appears poised to bear significant fruit over the next couple of design seasons.

ISE Design Suite 12 continues the Xilinx philosophy of domain-specific design tool bundles, tailored for specific types of design work. The new features will be rolled out incrementally throughout the various sub-releases over the coming year – with the current (12.1) release including clock gating, the next (12.2) release including partial reconfiguration support for Virtex-6, and AXI4 in the 12.3 release.

Swimming Upstream – Fast

Related

Leave a Reply Cancel reply

featured paper

Quickly and accurately identify inter-domain leakage issues in IC designs

featured chalk talk