Synthesizing a New Category

It started more or less like any typical press briefing. OK, slightly less typical because we were live in a conference room instead of doing things by conference call. But that’s not unheard of; it’s just hard to do on a regular basis in these far-flung times.

I was there with Oasys’s Sanjiv Kaul and Paul van Besouw. And as the conversation got started, Paul launched something on his laptop, and we continued the conversation for a few minutes.

I honestly don’t remember how many minutes it was, but it wasn’t a lot. And we went back to Paul’s computer and the thing he had started was complete.

It turns out that the thing he had started was synthesis of a 6-million-gate design using their RealTime Designer tool. About 300,000 lines of RTL code. Not the biggest design in the world, but far from a trivial demo design. And it finished in those few minutes. On his laptop.

Now… putting on my skeptic’s hat for a moment, I should be clear and honest here: I did not inspect the design; I counted neither the lines of RTL nor the gates. So I am taking them at their word that the design was of the size described. I also didn’t verify that this wasn’t some sham process that was designed to look like something was happening, maybe making some lights flash and some beeping sounds, only to finish its process of doing nothing to allow them to bring up the results files that were there all along. So I am also taking them at their word that the process was truly taking some input and manipulating it to create afresh some output.

Having laid out these caveats, I should also be clear that I haven’t lost any sleep wondering if I had been duped. Just covering my bases here.

So what gives? Well, early in the discussion, my marketing BS radar pinged a bit as they described their tool as creating a new space: “chip synthesis.” Now… it takes one to know one, and any marketer knows that one way to avoid direct competition is to define a new space and say you’re the only guy there. Bingo! No competition! Very convenient.

But that works only if it is different enough that the customer believes it to be different. And, while marketing is all about creating and controlling customer belief, when all is said and done, the customer is in charge and will reject attempts to install any beliefs that cross a certain ridiculousness boundary. Problem is, no one knows exactly where that boundary is. (In the consumer world, it can be astonishingly distant…)

And so we dance around defining new spaces in an attempt to see whether customers will at some point call BS. Will they call it on RealTime Designer? Let’s describe what they mean by it, and you can decide whether it’s truly a new and different concept or just marketing shenanigans. Or whether having a new category even matters.

Oasys describes the traditional synthesis process as having evolved from gate-level design. Optimizations are done at the gate level in order to meet design constraints, which, in today’s world, means both timing and power.

As levels of abstraction have been raised, all that’s happened is that we’ve enabled new and more efficient ways of expressing design intent. The tools then transform them back to gates, where they can be optimized.

There are a couple problems with this approach. First, once designs get too big to synthesize all at once, they have to be synthesized in blocks. No block has a view of the entire chip, so constraints that will impact multiple blocks must, more or less, be guessed at. Each block is done on its own, and then, at some lower level, things are stitched back together. Now the full chip performance can be worked, meaning that some of the guesses and assumptions made at the block level will turn out to be wrong and a new round of iterations will be necessary.

The second problem is that of trying to optimize at a low level instead of a high level. It’s a pretty well-known fact that in many scenarios, optimizing at the highest levels of abstraction yields the most dramatic results. Reducing power is best done at the architectural level, not the gate level. Improving data throughput is best done at the architectural level, not the net level. Oasys points to the parallel with software: optimizing a loop in C is much easier than trying to identify and implement that same optimization once the software has been compiled to machine language.

This concept suggests that optimization should be done at the RTL level, not the gate level. The problem is that optimization requires some knowledge of the physical layout of things: where wires go has a dramatic impact on performance. And highly congested portions of the chip can force some nasty routing, which isn’t a good thing for timing.

This kind of physical information generally isn’t available at the RTL level; it’s available only as you move down the abstraction chain. Traditionally, a specific understanding of real performance wouldn’t be known until place and route are complete, which happens long after synthesis in a typical flow.

What Oasys has done is turn the flow somewhat on its head. Clearly, routing is the final determinant of actual timing. But routing is strongly influenced by placement, so RealTime Designer does an initial placement at the RTL level.

In order to do this, a floorplan is needed. If one is provided by the designer, it will be followed; if not, the tool will come up with one itself. The design is partitioned, honoring the hierarchy, and a trial placement is done using timing estimates and trying to avoid congestion. If a particular partition doesn’t work, another will be tried, and the tool will iterate at this level to converge on a good placement. Such iterations at the RTL level go faster than iterations at the gate level.

Using this approach, the entire chip is optimized at once; it’s not a block-by-block approach. The result of this partitioning, placement, and subsequent synthesis is a pre-placed gate-level design that can be delivered to the place-and-route (P&R) tool. Some placement changes may be required during P&R, but, typically, very few. And, based on the quality of the placement, routing can generally proceed more easily. Which leads not just to faster synthesis, but also to faster P&R.

They also claim to have a completely new memory model that is much more compact, allowing synthesis on, well, a laptop, for example, with capacity to 100 million gates.

While timing has been the main driver, power is also on their roadmap. Today they handle multi-V_T and clock gating, but their plan is to have more comprehensive coverage based on both CPF and UPF files.

It’s kind of interesting that, unlike typical marketing presentations, they didn’t show only ideal data. Ideal data would illustrate only those designs that achieved the performance targets with less work and in less time than conventional tools. In fact, they showed a number of results that ended up with negative slack – in other words, timing was not met.

These were the designs that comprised the sum total of the most heinous circuits that various early access customers had proffered in order to stump the new upstart tool. But, even in those nightmare designs, slack was less negative than with conventional tools, and, more importantly, the result was achieved in less time and with fewer backend iterations. In some cases, a result was achieved where none could be achieved with older tools.

I couldn’t leave the conversation without broaching that obvious churlish question, “If this is so great, why isn’t everyone using it?” Which elicited the obvious answer: people are slow to change; it’s a major design flow upset; it doesn’t happen overnight. Meaning that either they should see some good traction at some point or someone will call them on smoke and mirrors.

So… is chip synthesis a new category or just an Oasys mirage? In actual fact, it doesn’t really matter. If the result is as good as promised, then it’s more or less an academic discussion.

Link: Oasys