feature article
Subscribe Now

Swimming Upstream – Fast

Xilinx ISE 12 Tackles Power

When it comes to FPGAs and power consumption, just about all the forces are working against us.  No matter how much we want to be “green,” or how much we want to have longer battery life, or smaller heat sinks, or fewer cooling fans, or faster clock speeds, or less weight, or…,  power consumption is pulling us the opposite direction.  In processing – power is already the supervillain that killed the monolithic processor and forced us into the compromise of multi-core. In FPGA, we haven’t yet given up and are just now in the heat of the struggle.

Power, as we have discussed many times, can be split into two buckets – dynamic and static. Dynamic power is the power we use doing actual “work” – like moving the packets that belong to the skateboarding bulldog video to a different place from the packets that belong to the social network post lamenting the fact that the scalpers were over-charging for that concert last night. Static power is the power we use just sitting there – doing nothing useful or productive at all. 

Dynamic power (in our normal digital CMOS world) depends on how much switching we do.  The more transistors we toggle, and the faster we toggle them, the more dynamic power we burn. Unfortunately, Moore’s Law continually drives us to do more with more.  We have more transistors, and we toggle them faster with each generation.  Unfortunately, as we have more transistors available, we are also sometimes more lax in our design style – leading to less efficient design for the same functionality. If we double the transistors every couple of years, use them less efficiently, AND run them twice as fast, we’re just asking for power troubles.  We’re also putting them closer together, which makes dissipating the heat more difficult.  The one tiny sliver of bright light here (and it’s an LED, so power consumption is low) is that smaller geometries can be operated at lower voltages, which helps to reduce dynamic power somewhat.  

Although dynamic power has always been out there wreaking havoc, static power has gotten most of the attention in recent generations with FPGAs.  As we make our transistors smaller, they tend to leak more current when they’re sitting still.  Since we’re doubling the number of transistors every generation and each transistor leaks more, leakage has been following an exponential curve just like Moore’s Law – only with a much less pleasing outcome.  FPGAs are doubly vulnerable, because the majority of FPGA transistors are used for configuration, and therefore most of the time they ARE just sitting there – leaking current.  At about 90nm (Remember back then? When you had to carry your development board 9 miles through six feet of snow?) we reached the point where static power and leakage current could no longer be ignored.  They had earned a spot right alongside dynamic power in causing us design heartburn.  

Now, we have survived from 90 to 65 to 45 and we’re talking about 28nm.  Power issues have gotten tougher each generation, but the FPGA vendors have stepped up their game each time with a potent cocktail of process choice, architectural improvement, and design tool innovation. The net result is that today’s biggest FPGAs running at full speed are much more power-efficient than those from previous generations, despite the seemingly overwhelming forces pushing the other way.  

Recently, in their ISE Design Suite 12 announcement, Xilinx rolled out another big victory for the coulomb counters.  The headline feature in their release is clock-gating technology that the company claims can reduce power “up to 30%”.  While that may sound like a lot of marketing speak, the reality behind the claims is strong.  

Clock gating, as you probably know, is really just what your mom always told you. “Turn the lights out when you leave the room.”  

When we’ve got a huge circuit – often comprised of over a billion transistors – the fact is that most of them spend most of their time not doing anything useful.  If we can stop them from toggling needlessly – from marking time during those intervals when they’re not used – we can save a lot of power.  

The unusual thing about Xilinx’s clock gating is the granularity.  In most strategies, large blocks of the circuit are turned off or put in an idle state when they’re not in use.  Xilinx, instead, looks at logic paths and transition logic, determines sequential elements whose transitions don’t affect downstream logic, and disables the clocks on the elements that won’t need to change.  This approach suggests that Xilinx will be able to shut down more of the circuit more of the time and see a net improvement in power reduction as a result.

In addition to the power optimization algorithms, Xilinx has improved their process for partial reconfiguration, making it so simple that even a rocket scientist could do it.  OK, that’s not fair – a rocket scientist probably couldn’t do it, but a reasonably capable FPGA-savvy designer could, which is a big improvement over previous generations of partial reconfiguration flows.  Xilinx has brought the PlanAhead tool more front-and-center in the partial reconfiguration process and contained the complexity so that doing a partial reconfiguration isn’t that much different from doing incremental design work.  

Partial reconfiguration has become a more heated topic recently, with even long-time skeptics Altera finally coming around and embracing the philosophy.  While partial reconfiguration is no panacea, it can stretch the effective density of your FPGA – enabling a significant class of applications where modal operation requires large sections of task-specific logic.

Xilinx also claims the usual but remarkable set of performance improvements in version 12, including 2x faster logic synthesis and 1.2x faster implementation (place-and-route) runtimes. While we’ve come to expect these improvements like spoiled children, it requires remarkable effort to continue to make significant gains optimizing the performance of complex algorithms like synthesis and layout.  

Finally, Xilinx is announcing the upcoming support for the new AMBA 4 AXI4 interconnect protocol. As they pre-announced last year, Xilinx has been partnering with ARM to develop the next AMBA standard, and FPGA footprints are all over it.  In addition to the “normal” AXI4, we’ll have AXI4-Lite and AXI4-Stream flavors that should bring us a lot closer to a standardized instantiate-and-play interface for most FPGA-ready IP.  

Combined with the company’s announcement of a future ARM/FPGA flexible computing platform, the partnership between Xilinx and ARM appears poised to bear significant fruit over the next couple of design seasons.  

ISE Design Suite 12 continues the Xilinx philosophy of domain-specific design tool bundles, tailored for specific types of design work.  The new features will be rolled out incrementally throughout the various sub-releases over the coming year – with the current (12.1) release including clock gating, the next (12.2) release including partial reconfiguration support for Virtex-6, and AXI4 in the 12.3 release.  

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

Featured video

Synopsys and Intel Full System PCIe 5.0 Interoperability Success

Sponsored by Synopsys

This video demonstrates industry's first successful system-level PCI Express (PCIe) 5.0 interoperability between the Synopsys DesignWare Controller and PHY IP for PCIe 5.0 and Intel Xeon Scalable processor (codename Sapphire Rapids). The ecosystem can use the companies' proven solutions to accelerate development of their PCIe 5.0-based products in high-performance computing and AI applications.

More information about DesignWare IP Solutions for PCI Express

featured paper

Keys to quick success using high-speed data converters

Sponsored by Texas Instruments

Whether you’re designing an aerospace system, test and measurement equipment or automotive lidar AFE, hardware designers using high-speed data converters face tough challenges with high-frequency inputs, outputs, clock rates and digital interface. Issues might include connecting with your field-programmable gate array, being confident that your first design pass will work or determining how to best model the system before building it. In this article, we take a look at each of these challenges.

Click here to download the whitepaper

Featured Chalk Talk

Thermal Bridge Technology

Sponsored by Mouser Electronics and TE Connectivity

Recent innovations can make your airflow cooling more efficient and effective. New thermal bridges can outperform conventional thermal pads in a number of ways. In this episode of Chalk Talk, Amelia Dalton chats with Zach Galbraith of TE Connectivity about the application of thermal bridges in cooling electronic designs.

More information about TE Thermal Bridge Technology