feature article
Subscribe Now

Optimization Moves Up a Level

Mentor’s RealTime Designer Rises to RTL

There are a lot of reasons why we can create so much circuitry on a single piece of silicon. Obvious ones include hard work developing processes that make it theoretically doable. But someone still has to do the design. So if I had to pick one word to describe why we can do this, it would be “abstraction.” And that’s all about the tools.

In fact, my first job out of college came courtesy of abstraction. Prior to that, using programmable logic involved figuring out the behavior you wanted, establishing (and re-establishing) Boolean equations that described the desired behavior, optimizing and minimizing those equations manually, and then figuring out which individual fuses needed to be blown in order to implement those equations. From that fuse map, a programmer (the hardware kind) could configure a device, which you could then use to figure out… that it’s not working quite like you wanted, allowing you to throw the one-time-programmable device away and try again.

I got to Monolithic Memories right as their PAL devices were taking off as the first high-volume programmable logic chips. There were two keys to their success: the fixed-OR architecture, which provided higher speed at lower cost, and their new PALASM software. That software allowed users to specify their designs not at the fuse map level, but at the Boolean level, with the software figuring out the fuse map. Logic manipulations and minimization were still done by hand.

That is, until software tools improved to include minimization. And then, to my amazement, I watched newcomer Altera’s MAXplus software convert between D, T, and J/K flip-flops automatically to minimize resource usage and allow functions that I thought wouldn’t fit to work quite nicely.

So, first came the new abstraction level – Boolean – and then came the ability to optimize at that level.

This has been going on for years with chip design. Like a Russian nesting doll: each layer of abstraction eventually gets abstracted itself into something yet higher. But each such transformation involves a modicum of friction. That’s because each new abstraction comes about thanks to newly-minted algorithms that can push current computing technology to its limits.

So it might take hours on beefy machines to achieve the conversion from one level to another, but that beats weeks of error-prone manual work. Even so, especially early on, you have to check your work. The algorithms have bugs or holes or whatever, and sometimes you have to take the result and dink with it manually to get a correct or more efficient result. So optimizations typically happen at lower levels. Which is unfortunate, since it breaks the link between the lower-level design and the original abstracted version.

But here’s the thing: high-level optimizations will be much more effective than lower-level ones. A better algorithm will save way more silicon than tweaking gate widths will, as suggested by the following anti-pyramid. Those highest-level optimizations tend to be done manually; below some line, the tools can automate it.

From a tools standpoint, you also reduce the number of logical objects you have to work with at higher levels, which creates efficiencies in memory and processing; that’s what’s behind the “#Objects” scale on the left.

Anti-pyramid_500.png 

Image courtesy Mentor Graphics

So chip designers, who have benefitted from automated optimization at the physical and gate levels, might look forward to that automated-optimization line moving yet higher, from the gate level to the RTL level.

And that’s exactly what Mentor claims they have now with their new RealTime Designer tool. What lets them do this today, when it wasn’t a thing before? Well, looked at simplistically, it’s because we’ve had lots of time since the emergence of basic RTL synthesis. In that time, algorithms have improved, and computing capability has ballooned enormously. So what would have been ridiculous to consider back then becomes tenable now.

We’ve already seen it with timing analysis, which used to be a big batch process that you ran all at once. Afterwards, you sorted through the results to see what needed fixing. Today, timing analysis engines run in real time, communicating with compilation tools so that timing-related decisions can be made by feeding smaller bits to the timing engine and making choices according to the results.

You may recall Oasys, a start-up company that made a name through much faster synthesis – and was then bought by Mentor. Just as happened before with timing analysis, this has enabled Mentor to speed up synthesis. Then, by keeping a placer running in the background (in the same spirit as keeping the timing engine handy), they can perform optimizations at the RTL level by seeing what their impact is during synthesis, rather than having to wait until synthesis is complete.

Synthesis_flow_combined_1000.png 

Image elements courtesy Mentor Graphics

It’s not just about pure synthesis, however; there are a number of things going on here, as described by Mentor. They characterize the existing process as one of doing basic synthesis first, using the results to cobble together a floorplan, and then using the floorplan to do “physical synthesis,” which is synthesis that takes into account more of the physical characteristics of the resulting layout. All with lots of iteration.

With their new flow, floorplanning and physical synthesis come together to deliver a result to the place-and-route engines that should provide far better results, ostensibly with no (or at least reduced) iteration. Mentor refers to this as a “place-first” approach.

Overall_flow_combined_1000.png 

Image elements courtesy Mentor Graphics

Other tricks that improve results include automated dataflow-driven floorplanning and design-space exploration, both of which overlay the core synthesis.

If you haven’t run across the latter before, it’s a brute-force technique that acknowledges the reality that not all optimization can be done algorithmically. Sometimes you simply need to try a bunch of things and see which one ends up providing the best result. That means running a bunch of variations and then graphing the results to select the best one. This can be used for a lot of different optimization problems, but, in all cases, it relies on having fast enough processing to be able to run through all the versions in a reasonable time.

 Design_space_exploration_mod_1000.png

Image courtesy Mentor Graphics

“WNS” = Worst Negative Slack

The other caveat with design-space exploration is that the notion of “best” may not be well-enough specified for a computer to select the winner. If, for instance, speed is the only consideration, then the highest frequency (or lowest delay) wins, and a computer can do that (perhaps with the help of some tie-breaker rules). But if, as in the case of synthesis, you’re balancing power, performance, and area (PPA), then it may require a human to determine the best balance and make the final decision – at which point the processing can continue automatically.

This is indeed how Mentor implements their design-space exploration: it runs the variations, presents you with the results, and you pick one.

Put this all together, and they claim as much as 10 times faster runs and the ability to handle 100-million-gate designs (which they say might be exemplified by a big Tegra graphics engine from nVidia or a highly-integrated network processor). They say that they’ve run this on production designs at the 28-, 20-, and 14-nm process nodes.

Looks like our abstraction doll may have just been enclosed by yet another one.

 

More info:

Mentor Graphics’ RealTime Designer

One thought on “Optimization Moves Up a Level”

Leave a Reply

featured blogs
May 18, 2021
'Virtuoso Meets Maxwell' is a blog series aimed at exploring the capabilities and potential of Virtuoso RF Solution and Virtuoso MultiTech. So, how does Virtuoso meets Maxwell? Now, the... [[ Click on the title to access the full blog on the Cadence Community site....
May 13, 2021
Samtec will attend the PCI-SIG Virtual Developers Conference on Tuesday, May 25th through Wednesday, May 26th, 2021. This is a free event for the 800+ member companies that develop and bring to market new products utilizing PCI Express technology. Attendee Registration is sti...
May 13, 2021
Our new IC design tool, PrimeSim Continuum, enables the next generation of hyper-convergent IC designs. Learn more from eeNews, Electronic Design & EE Times. The post Synopsys Makes Headlines with PrimeSim Continuum, an Innovative Circuit Simulation Solution appeared fi...
May 13, 2021
By Calibre Design Staff Prior to the availability of extreme ultraviolet (EUV) lithography, multi-patterning provided… The post A SAMPle of what you need to know about SAMP technology appeared first on Design with Calibre....

featured video

Industry’s First USB4 Silicon Success

Sponsored by Synopsys

USB4 offers up to 40Gbps speeds for incredibly fast connections. Join Synopsys to see the first demonstration of USB4 IP in silicon, along with real TX eyes for DesignWare USB4, DisplayPort, and USB 3.x IP.

Click here for more information about DesignWare USB4 IP

featured paper

Optimizing an OpenCL AI Kernel for the data center using Silexica’s SLX FPGA

Sponsored by Silexica

AI applications are increasingly contributing to FPGAs being used as co-processors in data centers. Silexica's newest application note shows how SLX FPGA accelerates an AI-related face detection design example, leveraging the bottom-up flow of Xilinx’s Vitis 2020.2 and Alveo U280 accelerator card.

Click to read

featured chalk talk

High-Performance Test to 70 GHz

Sponsored by Samtec

Today’s high-speed serial interfaces with PAM4 present serious challenges when it comes to test. Eval boards can end up huge, and signal integrity of the test point system is always a concern. In this episode of Chalk Talk, Amelia Dalton chats with Matthew Burns of Samtec about the Bullseye test point system, which can maintain signal integrity up to 70 GHz with a compact test point footprint.

Click here for more information about Samtec’s Bulls Eye® Test System