feature article
Subscribe Now

Skew This!

Azuro Touts Clock Concurrent Optimization for Aggressive Nodes

The concept of clocking a register is pretty simple. It’s Logic Design 101 stuff. Having an entire system controlled by a uniform clock makes accessible that which would otherwise be an intractable problem. It’s like adding traffic lights downtown to keep traffic from getting completely chaotic.

A whole discipline has grown out of this very basic concept: that of synchronous design. An entire ecosystem of tools and techniques has been built around some very fundamental assumptions of how to design such circuits. And as the circuits have gotten bigger, clock tree synthesis (CTS) has become an art in its own right. But, according to the team at Azuro, the foundation on which it’s built has some cracks in it, and that foundation needs to be replaced.

The nitty gritty details of clock timing can get a bit cumbersome if you want to spell things out in a manner that will withstand the objections of a clock specialist, but for the rest of us, things come down to some relatively straightforward concepts. The way we do things traditionally involves 1) designing a logic path that will provide the timing needed to achieve performance and then 2) creating a clock network. We rely on the elimination of clock skew to make this possible.

What that means is that we try to create a clock tree such that the paths from the clock source to all registers have the same delay – they have zero skew. That means sometimes adding delays here or there to balance the network, which is ok. As long as the paths are equal, the circuit will work.

But Azuro argues that as we approach small dimensions (starting around the 65-nm node), this idealization of zero skew must necessarily fail for three reasons.

One reason is the increased use of clock gating. A gate adds a delay in the clock path, and a gated path will be different from an ungated path. One way of addressing this is to push all clock gates to the bottom of the tree so that the clock paths are equal for as long as possible, diverging only at the very end, thereby keeping the skews minimal. But this means creating many more gates than are actually necessary, with accompanying complicated logic to control them. The result is greater area used and, ironically, greater power consumption – ironic because the whole purpose of clock gating is to reduce power. So not pushing the gates to the bottom of the network means that you’re introducing skew.

A second reason is simply the complexity of advanced SoCs. Start with the fact that IP blocks are inserted into the chips intact, with their pre-designed clock networks attached to the SoC clock network. Add test clocking schemes for accelerating or simplifying test sequences: these require clock muxes here and there that interfere with the normal clock networks. Add such further complicating factors as multiple clock domains and adaptive capabilities like dynamic voltage and/or frequency scaling (DVFS), and you’ve pretty much eliminated any pretense at creating a balanced clock network.

But overlaying both of these is one much more fundamental trump against ever having a balanced network: manufacturing variations that become more significant with each process node. Even if you have what looks like a simple, perfectly-designed clock tree, on any given die, on any given day, different paths will have different delays, and it’s unpredictable and it’s unsystematic and it will change with each die. And the variation has become significant enough that you can never margin your way out of it.

Add this variation on top of the clock gating and other complexities, each of which will itself have variation, and you find yourself unable to assume anything about skew.

Robbing Peter to pay Paul

With traditional clock design, a given registered path or pipeline will have a critical path – the one logic path along the chain of registers that acts as the rate-limiting step and determines the clock frequency. Designers try to get the critical path small enough to where they can guarantee the required performance and then insert a balanced clock tree.

Azuro argues that, since a balanced tree is no longer possible, a different way of designing is required. Instead of doing separate logic and clock path design steps, the logic and clock paths must be optimized concurrently – hence the moniker “concurrent clock optimization.” But here’s the deal: the clock paths can now be tweaked around a bit by tweaking the logic paths. This means that along a “chain” the timing of various stages may be different.

But there’s no free lunch here. If the logic path delays are monkeyed with, you’re essentially borrowing time from a prior or later stage, so someone has to pay up at the end: the overall chain must be in balance. No borrowing from the lottery, no borrowing from the education fund, no borrowing from Social Security. If you borrow more time than is available in the overall chain, you lose. The chain must end up with net positive slack overall after all the borrowing and repaying have been done.

Because each stage of a chain may have a different delay, you lose the concept of the critical path. Instead, it’s replaced with the concept of the critical chain: this is the chain with the least overall net positive slack, and it is the chain that determines the max frequency for the clock driving that chain and any others in the same network.

Of course, it’s one thing to come up with a clean theoretical new foundation to replace the old cracked foundation; it’s quite something else to implement a commercial tool that really works in the real world. Earlier this year, Azuro launched Rubix, which they claim as the first tool to combine the optimization of the logic and clock paths. They boast results as much as 20% faster than using traditional clock synthesis, but, perhaps more significantly, they also claim much faster design completion. This is due to the elimination of numerous iterations of the optimize-logic/build-clock-tree/hope-they-converge loop. If the predictions that the traditional flow completely breaks down around the 32-nm area are true, then it means not just faster time to market, but, in fact, it means getting to market versus not.

If accurate, that’s a pretty powerful promise. Maybe even enough to make you stop skewing around with the old way of doing things.

Link: Azuro Rubix

Leave a Reply

featured blogs
May 18, 2021
Since I was a kid, I’ve always been a fan of technology making life better. When I was 8, I remember programming the VCR to record the morning cartoons so I wouldn’t miss the good ones after the bus picked me up from school. When I was 10, I made mixtapes of my fa...
May 18, 2021
原文出è•: Please Excuse the Mesh: CFD and Pointwise ä½è…: Paul McLellan Cadence於今年四æˆæ”¶è³¼äº†æµé«”動力學公司Pointwiseã‚å¨æˆ‘的前ä¸ç¯‡æ–‡ç« æŽ¢è¨Ž PointwiseãPCIeã...
May 13, 2021
Our new IC design tool, PrimeSim Continuum, enables the next generation of hyper-convergent IC designs. Learn more from eeNews, Electronic Design & EE Times. The post Synopsys Makes Headlines with PrimeSim Continuum, an Innovative Circuit Simulation Solution appeared fi...
May 13, 2021
By Calibre Design Staff Prior to the availability of extreme ultraviolet (EUV) lithography, multi-patterning provided… The post A SAMPle of what you need to know about SAMP technology appeared first on Design with Calibre....

featured video

Industry’s First USB4 Silicon Success

Sponsored by Synopsys

USB4 offers up to 40Gbps speeds for incredibly fast connections. Join Synopsys to see the first demonstration of USB4 IP in silicon, along with real TX eyes for DesignWare USB4, DisplayPort, and USB 3.x IP.

Click here for more information about DesignWare USB4 IP

featured paper

USB-C and USB Power Delivery Solutions

Sponsored by Maxim Integrated

Every electronic market is rapidly adopting the latest USB Type-C® and USB Power Delivery (USB-PD) specifications. The new USB Type-C cable and connector specifications dramatically simplify the way we interconnect and power electronic gadgets. With the proliferation of battery-operated devices for consumer, medical, automotive, and industrial applications, USB-C is increasingly becoming the preferred universal standard for charging and powering of devices.

Click to download

featured chalk talk

Using the Graphical PMSM FOC Component in Harmony3

Sponsored by Mouser Electronics and Microchip

Developing embedded software, and particularly configuring your embedded system can be a major pain for development engineers. Getting all the drivers, middleware, and libraries you need set up and in the right place and working is a constant source of frustration. In this episode of Chak Talk, Amelia Dalton chats with Brett Novak of Microchip about Microchip’s MPLAB Harmony 3, with the MPLAB Harmony Configurator - an embedded development framework with a drag-and-drop GUI that makes configuration a snap.

Click here for more information about Microchip Technology MPLAB® X Integrated Development Environment (IDE)