editor's blog
Subscribe Now

SystemC HLS Optimizes Power

Forte occupies what you might call a middle level in logic synthesis. We’ve talked about the positioning before, but a concise way of looking at it might be as follows:

  • ANSI C/C++ provides an unstructured, untimed description of the design.
  • SystemC provides a structured, untimed description of the design.
  • RTL provides a structured, timed description of the design.

The middle one isn’t quite that simple: the interfaces are timed, either at the transaction or pin level. But the timing of what goes on inside is a product of synthesis and is subject to tradeoffs.

In an update conversation at DAC, Forte noted that one of the big improvements to their latest high-level synthesis (HLS) release, Cynthesizer 5, is the ability to include power in the tradeoffs in addition to performance and area. This actually required a complete redo of the underlying infrastructure, so much of the code is brand new.

One of the outcomes of that rework was to change how scheduling and allocation are done. For a given microarchitecture, scheduling refers to the process of assigning an event to a particular clock edge. For example, if two streams of logic converge and one needs eight clock cycles to complete and the other only three, then you could have the short-chain logic start early and then wait (“eager”) or start just-in-time to arrive with the long logic chain (“lazy”). Allocation assigns resources.

Their tools used to do scheduling first and then allocation. Now they happen at the same time, which means they can be co-optimized.

They can also do more design space exploration, with Monte Carlo capabilities. An example of this would be in the selection of a multiplier. In the past, they had one multiplier architecture; now they have several, with different performance/power/area tradeoffs. After manually dialing in the number of choices to get close, you can use Monte Carlo analysis to figure out which is best. (The manual part is just to keep the design space from being too enormous.) A half hour or so typically allows the tool to sort through thousands of different configurations to find the optimal one(s).

Optimizing for power brings one new consideration into play: state machine encoding. You generally want to minimize the number of bits switching (and even gate clocks to hit only the register that’s going to change). But one-hot, which is the extreme example, requires too many flip-flops. So they have a statistical algorithm that determines, short of one-hot, what the lowest-power encoding scheme would be.

Finally, they’ve put an algorithm viewer into the tool to allow the guys doing the implementation, who likely received it from the guy who wrote the algorithm, to get a better feel for what’s going on in the algorithm itself.

You can find more about their latest update in their announcement.

Leave a Reply

featured blogs
Jan 21, 2022
Here are a few teasers for what you'll find in this week's round-up of CFD news and notes. How AI can be trained to identify more objects than are in its learning dataset. Will GPUs really... [[ Click on the title to access the full blog on the Cadence Community si...
Jan 20, 2022
High performance computing continues to expand & evolve; our team shares their 2022 HPC predictions including new HPC applications and processor architectures. The post The Future of High-Performance Computing (HPC): Key Predictions for 2022 appeared first on From Silico...
Jan 20, 2022
As Josh Wardle famously said about his creation: "It's not trying to do anything shady with your data or your eyeballs ... It's just a game that's fun.'...

featured video

Synopsys & Samtec: Successful 112G PAM-4 System Interoperability

Sponsored by Synopsys

This Supercomputing Conference demo shows a seamless interoperability between Synopsys' DesignWare 112G Ethernet PHY IP and Samtec's NovaRay IO and cable assembly. The demo shows excellent performance, BER at 1e-08 and total insertion loss of 37dB. Synopsys and Samtec are enabling the industry with a complete 112G PAM-4 system, which is essential for high-performance computing.

Click here for more information about DesignWare Ethernet IP Solutions

featured paper

MAX22005 Universal Analog Input Enables Flexible Industrial Control Systems

Sponsored by Analog Devices

This application note provides information to help system engineers develop extremely precise, highly configurable, multi-channel industrial analog input front-ends by utilizing the MAX22005.

Click here to read more

featured chalk talk

Machine-Learning Optimized Chip Design -- Cadence Design Systems

Sponsored by Cadence Design Systems

New applications and technology are driving demand for even more compute and functionality in the devices we use every day. System on chip (SoC) designs are quickly migrating to new process nodes, and rapidly growing in size and complexity. In this episode of Chalk Talk, Amelia Dalton chats with Rod Metcalfe about how machine learning combined with distributed computing offers new capabilities to automate and scale RTL to GDS chip implementation flows, enabling design teams to support more, and increasingly complex, SoC projects.

Click here for more information about Cerebrus Intelligent Chip Explorer