feature article
Subscribe Now

Optimization Moves Up a Level

Mentor’s RealTime Designer Rises to RTL

There are a lot of reasons why we can create so much circuitry on a single piece of silicon. Obvious ones include hard work developing processes that make it theoretically doable. But someone still has to do the design. So if I had to pick one word to describe why we can do this, it would be “abstraction.” And that’s all about the tools.

In fact, my first job out of college came courtesy of abstraction. Prior to that, using programmable logic involved figuring out the behavior you wanted, establishing (and re-establishing) Boolean equations that described the desired behavior, optimizing and minimizing those equations manually, and then figuring out which individual fuses needed to be blown in order to implement those equations. From that fuse map, a programmer (the hardware kind) could configure a device, which you could then use to figure out… that it’s not working quite like you wanted, allowing you to throw the one-time-programmable device away and try again.

I got to Monolithic Memories right as their PAL devices were taking off as the first high-volume programmable logic chips. There were two keys to their success: the fixed-OR architecture, which provided higher speed at lower cost, and their new PALASM software. That software allowed users to specify their designs not at the fuse map level, but at the Boolean level, with the software figuring out the fuse map. Logic manipulations and minimization were still done by hand.

That is, until software tools improved to include minimization. And then, to my amazement, I watched newcomer Altera’s MAXplus software convert between D, T, and J/K flip-flops automatically to minimize resource usage and allow functions that I thought wouldn’t fit to work quite nicely.

So, first came the new abstraction level – Boolean – and then came the ability to optimize at that level.

This has been going on for years with chip design. Like a Russian nesting doll: each layer of abstraction eventually gets abstracted itself into something yet higher. But each such transformation involves a modicum of friction. That’s because each new abstraction comes about thanks to newly-minted algorithms that can push current computing technology to its limits.

So it might take hours on beefy machines to achieve the conversion from one level to another, but that beats weeks of error-prone manual work. Even so, especially early on, you have to check your work. The algorithms have bugs or holes or whatever, and sometimes you have to take the result and dink with it manually to get a correct or more efficient result. So optimizations typically happen at lower levels. Which is unfortunate, since it breaks the link between the lower-level design and the original abstracted version.

But here’s the thing: high-level optimizations will be much more effective than lower-level ones. A better algorithm will save way more silicon than tweaking gate widths will, as suggested by the following anti-pyramid. Those highest-level optimizations tend to be done manually; below some line, the tools can automate it.

From a tools standpoint, you also reduce the number of logical objects you have to work with at higher levels, which creates efficiencies in memory and processing; that’s what’s behind the “#Objects” scale on the left.


Image courtesy Mentor Graphics

So chip designers, who have benefitted from automated optimization at the physical and gate levels, might look forward to that automated-optimization line moving yet higher, from the gate level to the RTL level.

And that’s exactly what Mentor claims they have now with their new RealTime Designer tool. What lets them do this today, when it wasn’t a thing before? Well, looked at simplistically, it’s because we’ve had lots of time since the emergence of basic RTL synthesis. In that time, algorithms have improved, and computing capability has ballooned enormously. So what would have been ridiculous to consider back then becomes tenable now.

We’ve already seen it with timing analysis, which used to be a big batch process that you ran all at once. Afterwards, you sorted through the results to see what needed fixing. Today, timing analysis engines run in real time, communicating with compilation tools so that timing-related decisions can be made by feeding smaller bits to the timing engine and making choices according to the results.

You may recall Oasys, a start-up company that made a name through much faster synthesis – and was then bought by Mentor. Just as happened before with timing analysis, this has enabled Mentor to speed up synthesis. Then, by keeping a placer running in the background (in the same spirit as keeping the timing engine handy), they can perform optimizations at the RTL level by seeing what their impact is during synthesis, rather than having to wait until synthesis is complete.


Image elements courtesy Mentor Graphics

It’s not just about pure synthesis, however; there are a number of things going on here, as described by Mentor. They characterize the existing process as one of doing basic synthesis first, using the results to cobble together a floorplan, and then using the floorplan to do “physical synthesis,” which is synthesis that takes into account more of the physical characteristics of the resulting layout. All with lots of iteration.

With their new flow, floorplanning and physical synthesis come together to deliver a result to the place-and-route engines that should provide far better results, ostensibly with no (or at least reduced) iteration. Mentor refers to this as a “place-first” approach.


Image elements courtesy Mentor Graphics

Other tricks that improve results include automated dataflow-driven floorplanning and design-space exploration, both of which overlay the core synthesis.

If you haven’t run across the latter before, it’s a brute-force technique that acknowledges the reality that not all optimization can be done algorithmically. Sometimes you simply need to try a bunch of things and see which one ends up providing the best result. That means running a bunch of variations and then graphing the results to select the best one. This can be used for a lot of different optimization problems, but, in all cases, it relies on having fast enough processing to be able to run through all the versions in a reasonable time.


Image courtesy Mentor Graphics

“WNS” = Worst Negative Slack

The other caveat with design-space exploration is that the notion of “best” may not be well-enough specified for a computer to select the winner. If, for instance, speed is the only consideration, then the highest frequency (or lowest delay) wins, and a computer can do that (perhaps with the help of some tie-breaker rules). But if, as in the case of synthesis, you’re balancing power, performance, and area (PPA), then it may require a human to determine the best balance and make the final decision – at which point the processing can continue automatically.

This is indeed how Mentor implements their design-space exploration: it runs the variations, presents you with the results, and you pick one.

Put this all together, and they claim as much as 10 times faster runs and the ability to handle 100-million-gate designs (which they say might be exemplified by a big Tegra graphics engine from nVidia or a highly-integrated network processor). They say that they’ve run this on production designs at the 28-, 20-, and 14-nm process nodes.

Looks like our abstraction doll may have just been enclosed by yet another one.


More info:

Mentor Graphics’ RealTime Designer

One thought on “Optimization Moves Up a Level”

Leave a Reply

featured blogs
Aug 18, 2018
Once upon a time, the Santa Clara Valley was called the Valley of Heart'€™s Delight; the main industry was growing prunes; and there were orchards filled with apricot and cherry trees all over the place. Then in 1955, a future Nobel Prize winner named William Shockley moved...
Aug 17, 2018
Samtec’s growing portfolio of high-performance Silicon-to-Silicon'„¢ Applications Solutions answer the design challenges of routing 56 Gbps signals through a system. However, finding the ideal solution in a single-click probably is an obstacle. Samtec last updated the...
Aug 17, 2018
If you read my post Who Put the Silicon in Silicon Valley? then you know my conclusion: Let's go with Shockley. He invented the transistor, came here, hired a bunch of young PhDs, and sent them out (by accident, not design) to create the companies, that created the compa...
Aug 16, 2018
All of the little details were squared up when the check-plots came out for "final" review. Those same preliminary files were shared with the fab and assembly units and, of course, the vendors have c...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...