When FPGAs flirted with the million-ASIC-gate density for the first time, a bell went off in many designers’ heads. This was not the tinny tintinnabulation of a bouncy little bicycle bell telling them, “Be alert, cyclist coming through.” No, this was the foul, foreboding clang of calamity to come. This was the thousand-ton tanker train of timing-closure nightmares turning around to bear down on them again from a new direction, threatening to bring back that all-too recent memory of indeterminate iteration in their ASIC design process to cause chaos in their new, happy, FPGA lives.
These ex-ASIC-aficionados went right back to the solution that had saved them before – physical synthesis. EDA companies worked feverishly to adapt algorithms from the ASIC world to the FPGA problem, frustrated at the lack of flexibility offered by FPGA architectures and the resulting shortage of solutions available. In ASIC, they could resize buffers, change routing, and move modules with fine-grained precision. With FPGA, they were gridlocked. Most of the fancy tricks just didn’t apply to the fixed architecture of pre-fab parts.
FPGA physical synthesis products relied on retiming, logic restructuring, re-doing placement, and replication to squeeze out the last few nanoseconds of negative slack from pesky critical paths. The lingering problems were the topology and the design flow. The topology of the FPGA was not so smooth and linear as ASIC. Small, fine-grained adjustments were not possible, and bigger ones sent waves of unpredictable alterations through the whole design, threatening to prevent convergence on a working solution. The FPGA design flow was never meant to accommodate physical synthesis, so building it into the process was like strapping a jetpack onto a street sweeper.
Some design flows called for floorplanning followed by synthesis, or synthesis with floorplanning afterwards as a “pre-placement” step. Floorplanning, however, is a double-edged sword, often with both sharp edges pointed toward the designer. A beautiful floorplan is almost never a functional one, and the subtleties of physical effects on timing are anything but intuitive at the pre-synthesis stage. Many design teams spent weeks floorplanning only to find that they’d made things worse instead of better, or got almost all the way there with a floorplan only to have their design trashed by a last minute change.
Next, automated systems slid onto the scene. Automation promised to take the guesswork out of physical optimization by letting the algorithms do the heavy lifting. Interactive front-ends allowed the curious and the skeptical to review the results and make their own alterations after the fact. Again, the design flow was one of the biggest challenges. Optimization before place-and-route required the tool to acquire intimate knowledge and correlation with the layout process that was to follow. Optimization after place and route was plagued with correlation problems, as the netlist was usually altered by the layout software, and matching the model back to the logical domain was difficult.
In each case, the logical and physical synthesis systems used different timing models that needed to match up with a third model that place-and-route was using, and any placement done by the physical synthesis software had to follow rigorous design constraints imposed by the FPGA architecture. Physical synthesis could be effective only if it could make legal alterations to the placement and if it was working on the same critical paths as the place-and-route system. Otherwise, the two pieces of software would get into algorithmic tug-of-war, and your design would play the part of the rope.
Through all of this, most traditional FPGA designers stayed on the sidelines. Physical optimization was the purview of the fanatic fringe. It required an experienced hand, undying patience, and more than a little luck to coax the technology to deliver on its promise. If you had experience taming the physical synthesis beast in the ASIC world, you jumped in. Otherwise, you retained a healthy skepticism.
Now, Synplicity, the company who built its name on bringing complex technology to the masses, has bitten the bullet and done the huge homework required to bring a true, graph-based logical/physical synthesis system to market. For the first time, logical and physical optimization are performed in a single step, prior to the FPGA vendor’s place-and-route (which is reduced primarily to “route” in this design flow). Synplicity’s new Premier combines logic synthesis, placement, and global routing into one operation, eliminating or dramatically reducing the need for iteration between synthesis and place-and-route in order to achieve timing closure.
In addition to performing logic synthesis and placement, Premier includes a design planner for high-level floorplanning. The design planner allows expert guidance by designers that have specific requirements for overall layout, or for teams that are planning to partition the design into smaller subsets for parallel development. Premier also includes a UI that allows logical and physical analysis, even of a working FPGA, to provide HDL analysis capability that works in conjunction with embedded logic analyzers. Finally, Premier includes capabilities for single-chip prototyping of ASIC or structured-ASIC designs using FPGAs.
While much of Premier is a bundling of existing capabilities and technologies into a single offering, the newest (and most interesting) bit is the debut of graph-based physical synthesis. Historically, one of the problems with using ASIC-born physical synthesis techniques on FPGAs is the non-linearity of FPGA routing. In ASIC, a micron is pretty much a micron, and the same Manhattan distance in almost any direction induces the same amount of delay. That’s because in ASIC, when you want to get from point A to point B, you’re paying for an engineering crew to come along and build you a nice private freeway between the two locations, complete with lanes of the proper width and turns with the correct bank angle for the speed you intend to travel. In FPGA, however, all the city planning was done before you bought your chip, and your design will have to make do with the roads that are already there.
To carry our analogy farther, let’s say our FPGA looks like the San Francisco Bay area. Our ASIC-based placer might think it a bad idea to place the two ends of a critical path in San Jose and San Francisco. Those two points are about as far apart as you can get. But in our world, there’s a nice freeway (I-280) connecting the two, and transit time isn’t usually too bad. Our placer might think that Oakland and San Jose would be better endpoints (a little shorter distance), but the available routing (I-880) is terribly congested, and transit is usually much slower. If our placer isn’t doing global routing, it won’t know the difference between two points on a fast route and two other points connected by a slower route. If our placer were really over-achieving, it might decide that Oakland and San Mateo would make the ideal placement. Those two locations are, after all, the nearest of all – but there, bigger than a block RAM, sits the San Francisco Bay, creating a routing obstruction that makes our distance-based placement look silly.
Synplicity’s Premier understands the routing resources available and pre-places and pre-routes critical connections to be sure that they’ll meet timing. With full knowledge of both the logical and physical aspects of the design, it is able to create a pre-placement that will closely mirror the final results from layout, both in the physical and timing sense. Using a detailed graph of available resources, it can create a much more accurate model resulting in tighter timing results. Synplicity claims that their suite of over 200 designs shows between 5 and 20 percent improvement in timing. Possibly more impressive is the claim of 90 percent of timing predictions within 10% of actual timing, and 70 percent of predictions within 5%.
The goal of all this accuracy and optimization is to reduce the number of iterations between logic synthesis and place-and-route. On large FPGAs with tight timing, some teams not using physical synthesis experience multi-hour tool times for each iteration, so double-digit numbers of passes can have a serious impact on design schedules. A tool that could narrow that number down by a factor of five or ten could have a huge ROI. Additionally, as we’ve said before, a timing improvement of 15% or more puts you in the range of dropping a speed grade on your FPGA. That can reduce part cost in the neighborhood of 30%. If you’re doing much volume, the silicon savings alone can pay for the additional tool investment.
Clearly Synplicity has thought beyond physical synthesis with the introduction of Premier. They’ve clearly worked to develop a product that can take FPGA designers from debug through final implementation with a single, integrated flow. With today’s larger devices, however, physical synthesis threatens to become more of a necessity than a luxury, and it’s reasonable for the company to launch their new flagship product with that capability designed in at the ground level.
For now, the full-blown physical optimization is only available for Xilinx devices, with other vendors coming soon. Clearly, the engineering investment required to bring physical synthesis online is considerably greater than for logic synthesis alone. Also, the intimate knowledge required to accurately model the placement and routing resources can be difficult to coax from IP-savvy FPGA vendors. Nonetheless, the promise of performance offered by the industry’s newest physical optimization tool should be enough to gain critical mass.
Premier may be the product that brings the average FPGA design team into the physical synthesis world. Certainly the respectable capabilities of FPGA vendor-provided software has kept much of the market in “wait and see” model. However, with its bundling and integration of capabilities, Synplicity’s traditional pushbutton approach to automation, and the technical edge of well-designed physical optimization, Premier should have what is needed to pull the programmable-logic proletariat across the threshold.