If you’ve worked with large designs that need to be partitioned into multiple FPGAs, you’ve probably often thought how awesome automatic partitioning would be. You just throw your big’ol design at a fancy EDA tool, push the big green “GO” button, and BAM! Your whole design is sliced up into pieces – just like in one of those martial arts movies where the ninja slices the bad guy into about seven pieces so cleanly that he doesn’t even start to fall apart right away.
Your design would be cleanly ninja-sliced into perfect partitions that fit easily into your target FPGAs with the minimal number of inter-FPGA connections. You’d have no timing problems whatsoever, and you’d barely notice that your design wasn’t running on one big super-FPGA. Absolutely no manual intervention was required.
Then, of course, you woke up from that dream.
You may have tried some of the more infamous automatic partitioning tools, which, by all accounts, for many years, were pretty useless for any real work. They’d cut your design up all right, but getting a partitioning job that you could actually work with was somewhere between a big chore and impossible. The partitions had to fit in all of the target devices, synthesize and place-and-route correctly on their own, map correctly to the pinout of the prototyping board you were using, meet your timing constraints for whatever performance you were trying to get your prototype to achieve, fit within the clock domain limitations of each target device, and still provide some facsimile of your original design so you could have a clue where to begin when you found a bug. After all, that was the reason you were building the prototype in the first place.
It usually ended up being a lot easier to simply partition your design manually. You’d start with your big IP blocks and put the ones that didn’t communicate much with each other into different partitions. You’d clump together the subsystems that seemed to have a lot of natural affinity. You’d end up fudging a little bit to fill leftover space in some of the FPGAs and to avoid overloading others. There was no rigorous method for manual partitioning, but once you got the hang of it, things just seemed to intuitively fall into place.
But, as your designs got bigger and bigger, that intuition wasn’t quite so strong anymore. The number of clocks went up, the number of IP blocks that didn’t like to participate in your VHDL vivisection increased, and the number of brain cells available to hold the whole thing didn’t rise a single bit. With each passing node of Moore’s Law and each step up the integration tree, the task of manually partitioning a huge (and getting huger) design into multiple FPGAs in order to build a prototype got more complex.
Still, there wasn’t much attention given to automatic design partitioning. That dog already had his day, right? Like many technologies that come out of the gate with a lot of hype and fanfare – automatic FPGA partitioning had gotten a bit of a black eye. Years of folklore haven’t done a lot to heal that wound, so you just don’t hear too much about new automatic FPGA partitioners these days.
Flexras (if you haven’t already Googled them) is a French company. As you might expect, those crazy French don’t know any better than to go off solving problems that the rest of the industry has basically given up on. As a result, at the Design Automation Conference last week – they were telling everyone about their new Wasga Compiler – the “only-and-first timing-driven partitioning tool for SoC rapid prototyping.” The company says Wasga is fast, has huge capacity, and generates high-performance partitioned designs.
Making the partitioning process timing-driven tackles one of the biggest issues with legacy partitioners. Managing timing across partitions manually is a nightmare, and trying to use other partitioning metrics with the hopes that timing will come out OK is purely betting on luck. Usually, our luck is not so good. Flexras has a number of algorithmic innovations that they claim make their tool far faster than previous-generation partitioners. Faster generally leads to more capacity (and Flexras has some impressive size design examples) and having the whole thing timing-driven means you are much less likely to end up with an incomprehensible rat’s nest of wires with negative slack that nobody can debug.
Flexras says that their partitioner works with both Xilinx and Altera FPGAs and with any commercially available or custom-designed prototyping board. The company says that the Wasga Compiler can handle designs over a billion gates equivalent. It comes with a GUI to help you set up your project (configuration of prototyping board, etc). It allows automatic or manual placement and routing, and it takes advantage of proprietary high-speed multiplexing IP to speed up inter-FPGA communication.
Timing constraints are provided via SDC, and the tool also provides a system-level static timing analysis. You can also set up the tool to automatically run and control your back-end flow, and it can automatically handle iterative runs and verification of the results. The source design can be RTL or gate level, or a combination of both. The partitioning process itself can be run automatically or step by step with manual intervention. The overall flow is incremental, so you can record scripts to reproduce sequences of steps that are working to assure consistent results while making small changes. This addresses one of the biggest issues with older partitioners, where a small change could propagate massive changes in the design – resulting in severe and widespread timing problems. With the Flexras approach, only the changed modules are re-synthesized during each iteration – preserving the timing behavior of finished blocks.
Wasga does a global placement, and it tries to preserve the original design hierarchy as much as possible. This means that you’ll probably see some semblance of your original design in your multi-FPGA prototype – a convenience that was not always a given with other automatic partitioners. A global router then analyzes inter-FPGA connections, working to avoid partitioning-induced timing issues before they occur. The inter-FPGA timing is achieved by constraining the FPGA tools that are operating on each individual FPGA to conform to a global timing budget established by the partitioner.
Wasga technology is likely to end up being OEMed by selected prototyping board vendors, as well as being sold stand-alone. We also wouldn’t be surprised to see Wasga algorithms hiding inside some of your favorite emulators (for those of you with the budget to actually have a favorite emulator.)
It’s exciting to see this long-quiet segment of multi-FPGA design getting some new life, and with the incredible capability of some of today’s FPGAs, we may soon see some truly massive systems prototyped using Wasga. It will be fun to watch!