Getting Your Clock Cleaned

Power reduction continues to ride high on the hit parade, with companies of all shapes and sizes attacking it from numerous angles. We’ve taken a sampling of various techniques before, at both the front and back ends of the design process. And activity continues at both ends. At the front end, Mentor has just announced that their Vista ESL platform addresses power at the transaction level. At the back end, the element that seems to be getting the most continued attention is the clock. Which makes sense, of course, since the clock(s) contribute(s) or generate(s) the vast majority of transitions that cause the chip to use power in a more productive way than just sitting there leaking.**.

Most of the approaches to the clock’s impact on power involve clock gating. This technique contributes several of the ten steps that Sequence introduced in their PowerArtist tool last year. Sequence has, in fact, come back this year automating a number of the steps that were originally “guided,” reflecting their realization that many (although not all) designers have become comfortable with tools making RTL changes automatically. Which should make more than just Sequence happy, as we’ll see.

Calypto is also driving the clock gating concept with their PowerPro CG product. Much of what happens with both Sequence and Calypto relates to finding as many opportunities as possible for making clock gating as efficient as possible. Given a design that has some clock gating already inserted by the designer, it means identifying additional clocks that can be gated, optimizing by combining the logic used for clock gating in various and sundry clever ways, and then competing over whose ways are the cleverest.

A clock-gated register ends up with two logic cones: the one we traditionally think of that feeds the D input and a new logic cone that comprises the clock gating logic. The techniques that Sequence and Calypto use focus on the clock gating cone, leaving the D cone alone. So while these two companies are off preparing for the inevitable bake-offs to demonstrate that one’s technology can outstrip the other’s, there’s another approach that’s been quietly brewing off in Scandinavia.

This new methodology reflects some nuance that has crept into the problem, as observed by all the players: it’s not only about the total energy consumed in the usage of a chip. It’s also the peak power, or, as Sequence’s Preeti Gupta points out, an even more nuanced “sustained peak” power, which has thermal consequences. This means it’s not enough to worry about how much power you’re using. No, consistent with that apparent corollary to the second law of thermo, which states that the number of things you have to stress over must necessarily increase unceasingly over time, now you must also worry about when that power is burned.

And, by messing about with power timing, we find that, even though we thought we were talking about power consumption, we have, as if through a magic wormhole, magically appeared in the beguiling land of power integrity, where this combination of power and noise is being tackled by Danish newcomer TeklaTech. They’re transcending the temporal domain, making additional use of the frequency domain to understand the noise content and its sources.

The first elements of TeklaTech’s methodology are partitioning and block analysis, providing an idea of how various chunks of the circuit affect the power and noise of other chunks. This can be done in the time or frequency domain. The floorplan has a big impact on such power dependencies (a rather overloaded phrase), with the obvious example being the need to keep noisy circuits away from analog blocks. TeklaTech’s FloorDirector lets the designer use the results of the analysis to guide a traditional floorplanner in a manner that reduces power noise.

But the more interesting aspect is an activity called “power-shaping.” And, right at the outset, you have to admit that such an activity, if nothing else, sounds incredibly cool. Machiavelli is drooling in his grave. Nietzsche wishes he could rise from the dead, kick their – oh, I’ll say derrieres here for the sake of propriety, although he wouldn’t be caught dead using such an effeminate word – steal the phrase, and, using his proprietary copy of Machtrosoft Wort®, reclaim it in his writings as if it had been there the whole time.

Within the digital blocks of the circuit, TeklaTech can actually re-synthesize the clock tree (or, strictly speaking, provide guidance to the native clock-tree synthesis engine), moving clock edges around so that, basically, you have fewer of them firing at the same time. Unlike the clock gating techniques we’ve already seen, this affects the D logic cone of the registers instead of the clock enable logic cone. They modify the data path to compress the timing window and then re-schedule the transitions, using timing constraints as a guide to ensure that the resulting circuit still performs as required.

This process can be done automatically and, if you think about it, more or less rips up the data path that you so carelessly tossed together without a thought for the noise you’d be imposing on your neighbors through that thin wall and replaces it with a good one. OK, that’s a bit harsh, but it kind of does have that feel of pulling out one and putting in another. Now… if you’re one of those designers not yet comfortable with someone else smudging up your sparkling RTL – in particular, a soulless tool using a mindless automatic algorithm, then this might feel even more horrific than automatic clock gating. Formal techniques could presumably be put to use to confirm the correctness of the new circuit, but we’ll defer discussion of such uses of formal analysis to a future time.

The amount of logic in the overall chip may go up slightly due to these modifications, but they’ve found that, with less than 1% additional logic, they have achieved reductions of 50% in peak current. (This 1% number is also consistent with the typical upper limit of added circuitry claimed by Sequence’s and Calypto’s techniques.)

It might even be interesting to use both clock gating and retiming at the same time. Or rather, perhaps, do clock gating first, then analyze the new current signature and reschedule transitions appropriately. This is, of course, pure speculation, which is the supreme editorial prerogative, and actual proof of efficacy is deemed a pedantic detail and left as an exercise for the reader.

*If you’re not too squeamish, take a quick look at what free translation makes of the Dutch phrase for “leakage current,”lekstroom, as a result of mis-parsing it into lekst+room instead of the correct lek+stroom.