Power has become a key design consideration for SoCs in pretty much any application. We’ve looked at some ways of reducing power in past articles, largely at a high level. We continue here with a specific look at some techniques that can be identified by a new tool from Sequence called PowerArtist. This tool takes ten specific steps to identify ways to reduce power, although only a couple of them are automatically implemented. Most of them may take some engineering evaluation to decide whether to implement, and, if so, exactly how to do them, so those techniques are so-called “guided” ones, in that the tool guides the engineer towards power savings opportunities.
The focus of the power savings techniques was directed by some data that Sequence gathered regarding where power is typically consumed in an SoC. The top three items were the clocks (30-60%), memories (20-50%), and datapath (~20%). These are therefore the areas that PowerArtist considers. Four of its “PowerBots” – the name they use for each of the analysis engines – address clock power; three address memories, and three cover the datapath.
When it comes to clocks, it’s all about the enables. It is generally accepted that using clock enables can reduce power. What’s less well understood is that for active signals that would rarely be disabled, adding a clock enable can actually increase power, since the power added by additional enabling circuitry overwhelms any potential minor power reduction. So a key aspect of what PowerArtist does is to determine which clocks would actually benefit from having a clock enable.
Of course, simply adding gating to a clock signal may disrupt the timing of that clock and certainly will add skew and unbalance the clock tree. Synthesis engines already have the capability of recognizing certain styles of logic as opportunities for clock gating and can implement the gating, taking into account all of the timing considerations. Rather than duplicate this effort, PowerArtist doesn’t explicitly gate the clock signals; instead, it creates logic that will be recognized by the synthesis tool as a clock-gating opportunity and lets the synthesis engine take care of the implementation details. But synthesis engines don’t take the signaling activity into account and therefore can’t distinguish clock gating that increases power; PowerArtist does this analysis, guiding the synthesis tool accordingly.
There are four steps to clock power reduction. The first is to find additional opportunities to make use of existing enables. This is a guided step, providing the designer with information on clocks without enables as well as the enable signals that exist, and the designer can apply those enables to additional registers if that makes sense for the function. Because reusing existing enable signals burns negligible additional power, this would be the preferred way to reduce power. For any remaining non-enabled clocks that couldn’t use existing enables for whatever reason, the second step is that PowerArtist goes in and generates edge-detection logic to create new enable signals – as long as this will result in power savings.
The third step is to make use of the built-in analysis engine within PowerArtist to look over the enable logic for all of the registers. The tool provides detailed power information on both the logic cone fanning into the register and the logic fanning out of the register. The designer can use this information to look for alternative enabling signals or strategies that might consume less power.
Finally. PowerArtist generates a “master list” of clock enables to be created by the synthesis tool. This is essentially a series of constraints that will ensure that clock enabling is done only for those scenarios that will result in lower power. All enabled clocks are considered in this analysis, whether generated by the user or PowerArtist. However, because the latter are created only if power will be reduced, they will always be part of the master list. Any user-created enables that won’t reduce power are left off the list.
The next three steps relate to memory. The first automatically applies gates to memory clocks to inhibit clocking when the memory is inactive; this is entirely analogous to the main clock gating applied above, except that it typically involves the memory select signals as part of the clock gating.
The second step is a check to find opportunities to split wide memories. The idea here is that if the entire address field isn’t changing – a reasonably common situation, especially when running through a contiguous address range – then by having two half-memories instead of one full memory, half the memory can essentially be static (presumably the MSB half) while the other half cycles through the addresses. This is a guided operation and is not automatically implemented by the tool.
The third step aims at reducing signal switching on the data lines of the memory. Data inputs to a memory are useful only if the memory is being written. If the memory isn’t in write mode, then switching data inputs are simply wasting power. With a shared bus, the inputs and outputs use the same lines, but this step is focused on the cone of logic moving towards the memory and identifying logic to quiet the signals when not needed. It’s a guided step, so the tool will point out those opportunities, and the designer decides whether and how to implement any changes.
Finally, there are three steps dedicated to reducing power in the datapath. The first recognizes that many multiplexers may spend a reasonable amount of time with a particular input selected. One such example would be multiplexers used to implement some sort of switching of test or diagnostic circuitry that is usually not used in standard operation; another would be any multiplexers used in the switching of the many “modes” now associated with complex circuits. Any input switching on an unselected multiplexer input is wasted power, so this step identifies such occurrences; implementation is left to the designer.
The second datapath step recognizes that logic switching when a clock is disabled may be wasted activity, and it identifies opportunities to cut down the amount of logic switching by using existing enable signals. Again, the decision as to whether to do this and the implementation are handled by the designer.
The third step is similar to some of the ones we’ve already seen, but with a subtle difference. All prior attempts to quiet unnecessary activity have focused on signals switching over a series of clock cycles when those inputs will have no impact. The last datapath technique looks for activity within a clock cycle. Such activity is usually caused by intermediate logic states and hazards, and, while they pose no logic problem if properly timed, they do increase the amount of switching and therefore the power. This final step identifies opportunities to quiet such transient switching.
One of the things Sequence seems to have tried to do is to make it easier to identify the impact of any changes. The GUI provides power information at a relatively high level of precision. A list of all potential RTL changes is provided, along with the available power savings (in absolute and percentage terms), as well as the area overhead of making the change for automatically-implemented changes. Those automatic changes can be accepted or rejected. The display allows simultaneous viewing and cross-probing between this list, the RTL, and a block schematic view. They also use an OpenAccess database for all of the analysis results, allowing scripting for getting additional information that might not be part of the standard display.
The whole PowerArtist approach, being a collection of various and sundry techniques that can vary widely in their approach, lends itself to future additions of new engines and refinements to the existing ones. Sequence hasn’t made any specific promises to that effect, but it seems as though the foundation has been laid. It’s not a great leap of faith to think that more power savings could be on the horizon.