feature article
Subscribe Now

Skew This!

Azuro Touts Clock Concurrent Optimization for Aggressive Nodes

The concept of clocking a register is pretty simple. It’s Logic Design 101 stuff. Having an entire system controlled by a uniform clock makes accessible that which would otherwise be an intractable problem. It’s like adding traffic lights downtown to keep traffic from getting completely chaotic.

A whole discipline has grown out of this very basic concept: that of synchronous design. An entire ecosystem of tools and techniques has been built around some very fundamental assumptions of how to design such circuits. And as the circuits have gotten bigger, clock tree synthesis (CTS) has become an art in its own right. But, according to the team at Azuro, the foundation on which it’s built has some cracks in it, and that foundation needs to be replaced.

The nitty gritty details of clock timing can get a bit cumbersome if you want to spell things out in a manner that will withstand the objections of a clock specialist, but for the rest of us, things come down to some relatively straightforward concepts. The way we do things traditionally involves 1) designing a logic path that will provide the timing needed to achieve performance and then 2) creating a clock network. We rely on the elimination of clock skew to make this possible.

What that means is that we try to create a clock tree such that the paths from the clock source to all registers have the same delay – they have zero skew. That means sometimes adding delays here or there to balance the network, which is ok. As long as the paths are equal, the circuit will work.

But Azuro argues that as we approach small dimensions (starting around the 65-nm node), this idealization of zero skew must necessarily fail for three reasons.

One reason is the increased use of clock gating. A gate adds a delay in the clock path, and a gated path will be different from an ungated path. One way of addressing this is to push all clock gates to the bottom of the tree so that the clock paths are equal for as long as possible, diverging only at the very end, thereby keeping the skews minimal. But this means creating many more gates than are actually necessary, with accompanying complicated logic to control them. The result is greater area used and, ironically, greater power consumption – ironic because the whole purpose of clock gating is to reduce power. So not pushing the gates to the bottom of the network means that you’re introducing skew.

A second reason is simply the complexity of advanced SoCs. Start with the fact that IP blocks are inserted into the chips intact, with their pre-designed clock networks attached to the SoC clock network. Add test clocking schemes for accelerating or simplifying test sequences: these require clock muxes here and there that interfere with the normal clock networks. Add such further complicating factors as multiple clock domains and adaptive capabilities like dynamic voltage and/or frequency scaling (DVFS), and you’ve pretty much eliminated any pretense at creating a balanced clock network.

But overlaying both of these is one much more fundamental trump against ever having a balanced network: manufacturing variations that become more significant with each process node. Even if you have what looks like a simple, perfectly-designed clock tree, on any given die, on any given day, different paths will have different delays, and it’s unpredictable and it’s unsystematic and it will change with each die. And the variation has become significant enough that you can never margin your way out of it.

Add this variation on top of the clock gating and other complexities, each of which will itself have variation, and you find yourself unable to assume anything about skew.

Robbing Peter to pay Paul

With traditional clock design, a given registered path or pipeline will have a critical path – the one logic path along the chain of registers that acts as the rate-limiting step and determines the clock frequency. Designers try to get the critical path small enough to where they can guarantee the required performance and then insert a balanced clock tree.

Azuro argues that, since a balanced tree is no longer possible, a different way of designing is required. Instead of doing separate logic and clock path design steps, the logic and clock paths must be optimized concurrently – hence the moniker “concurrent clock optimization.” But here’s the deal: the clock paths can now be tweaked around a bit by tweaking the logic paths. This means that along a “chain” the timing of various stages may be different.

But there’s no free lunch here. If the logic path delays are monkeyed with, you’re essentially borrowing time from a prior or later stage, so someone has to pay up at the end: the overall chain must be in balance. No borrowing from the lottery, no borrowing from the education fund, no borrowing from Social Security. If you borrow more time than is available in the overall chain, you lose. The chain must end up with net positive slack overall after all the borrowing and repaying have been done.

Because each stage of a chain may have a different delay, you lose the concept of the critical path. Instead, it’s replaced with the concept of the critical chain: this is the chain with the least overall net positive slack, and it is the chain that determines the max frequency for the clock driving that chain and any others in the same network.

Of course, it’s one thing to come up with a clean theoretical new foundation to replace the old cracked foundation; it’s quite something else to implement a commercial tool that really works in the real world. Earlier this year, Azuro launched Rubix, which they claim as the first tool to combine the optimization of the logic and clock paths. They boast results as much as 20% faster than using traditional clock synthesis, but, perhaps more significantly, they also claim much faster design completion. This is due to the elimination of numerous iterations of the optimize-logic/build-clock-tree/hope-they-converge loop. If the predictions that the traditional flow completely breaks down around the 32-nm area are true, then it means not just faster time to market, but, in fact, it means getting to market versus not.

If accurate, that’s a pretty powerful promise. Maybe even enough to make you stop skewing around with the old way of doing things.

Link: Azuro Rubix

Leave a Reply

featured blogs
Apr 16, 2024
In today's semiconductor era, every minute, you always look for the opportunity to enhance your skills and learning growth and want to keep up to date with the technology. This could mean you would also like to get hold of the small concepts behind the complex chip desig...
Apr 11, 2024
See how Achronix used our physical verification tools to accelerate the SoC design and verification flow, boosting chip design productivity w/ cloud-based EDA.The post Achronix Achieves 5X Faster Physical Verification for Full SoC Within Budget with Synopsys Cloud appeared ...
Mar 30, 2024
Join me on a brief stream-of-consciousness tour to see what it's like to live inside (what I laughingly call) my mind...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured chalk talk

Medical Grade Power
Sponsored by Mouser Electronics and RECOM
In this episode of Chalk Talk, Amelia Dalton and Louis Bouche from RECOM explore the various design requirements for medical grade power supplies. They also examine the role that isolation and leakage current play in this arena and the solutions that RECOM offers in terms of medical grade power supplies.
Nov 9, 2023
20,476 views