feature article
Subscribe Now

Flexras Makes a Finer Cut

FPGA Partitioning for the Modern Era

If you’ve worked with large designs that need to be partitioned into multiple FPGAs, you’ve probably often thought how awesome automatic partitioning would be. You just throw your big’ol design at a fancy EDA tool, push the big green “GO” button, and BAM! Your whole design is sliced up into pieces – just like in one of those martial arts movies where the ninja slices the bad guy into about seven pieces so cleanly that he doesn’t even start to fall apart right away.

Your design would be cleanly ninja-sliced into perfect partitions that fit easily into your target FPGAs with the minimal number of inter-FPGA connections. You’d have no timing problems whatsoever, and you’d barely notice that your design wasn’t running on one big super-FPGA. Absolutely no manual intervention was required.

Then, of course, you woke up from that dream. 

You may have tried some of the more infamous automatic partitioning tools, which, by all accounts, for many years, were pretty useless for any real work. They’d cut your design up all right, but getting a partitioning job that you could actually work with was somewhere between a big chore and impossible. The partitions had to fit in all of the target devices, synthesize and place-and-route correctly on their own, map correctly to the pinout of the prototyping board you were using, meet your timing constraints for whatever performance you were trying to get your prototype to achieve, fit within the clock domain limitations of each target device, and still provide some facsimile of your original design so you could have a clue where to begin when you found a bug. After all, that was the reason you were building the prototype in the first place. 

It usually ended up being a lot easier to simply partition your design manually. You’d start with your big IP blocks and put the ones that didn’t communicate much with each other into different partitions. You’d clump together the subsystems that seemed to have a lot of natural affinity. You’d end up fudging a little bit to fill leftover space in some of the FPGAs and to avoid overloading others. There was no rigorous method for manual partitioning, but once you got the hang of it, things just seemed to intuitively fall into place. 

But, as your designs got bigger and bigger, that intuition wasn’t quite so strong anymore. The number of clocks went up, the number of IP blocks that didn’t like to participate in your VHDL vivisection increased, and the number of brain cells available to hold the whole thing didn’t rise a single bit. With each passing node of Moore’s Law and each step up the integration tree, the task of manually partitioning a huge (and getting huger) design into multiple FPGAs in order to build a prototype got more complex.

Still, there wasn’t much attention given to automatic design partitioning. That dog already had his day, right? Like many technologies that come out of the gate with a lot of hype and fanfare – automatic FPGA partitioning had gotten a bit of a black eye. Years of folklore haven’t done a lot to heal that wound, so you just don’t hear too much about new automatic FPGA partitioners these days.

Until now.

Flexras (if you haven’t already Googled them) is a French company. As you might expect, those crazy French don’t know any better than to go off solving problems that the rest of the industry has basically given up on. As a result, at the Design Automation Conference last week – they were telling everyone about their new Wasga Compiler – the “only-and-first timing-driven partitioning tool for SoC rapid prototyping.” The company says Wasga is fast, has huge capacity, and generates high-performance partitioned designs.

Making the partitioning process timing-driven tackles one of the biggest issues with legacy partitioners. Managing timing across partitions manually is a nightmare, and trying to use other partitioning metrics with the hopes that timing will come out OK is purely betting on luck. Usually, our luck is not so good. Flexras has a number of algorithmic innovations that they claim make their tool far faster than previous-generation partitioners. Faster generally leads to more capacity (and Flexras has some impressive size design examples) and having the whole thing timing-driven means you are much less likely to end up with an incomprehensible rat’s nest of wires with negative slack that nobody can debug.

Flexras says that their partitioner works with both Xilinx and Altera FPGAs and with any commercially available or custom-designed prototyping board. The company says that the Wasga Compiler can handle designs over a billion gates equivalent. It comes with a GUI to help you set up your project (configuration of prototyping board, etc). It allows automatic or manual placement and routing, and it takes advantage of proprietary high-speed multiplexing IP to speed up inter-FPGA communication.

Timing constraints are provided via SDC, and the tool also provides a system-level static timing analysis. You can also set up the tool to automatically run and control your back-end flow, and it can automatically handle iterative runs and verification of the results. The source design can be RTL or gate level, or a combination of both. The partitioning process itself can be run automatically or step by step with manual intervention. The overall flow is incremental, so you can record scripts to reproduce sequences of steps that are working to assure consistent results while making small changes. This addresses one of the biggest issues with older partitioners, where a small change could propagate massive changes in the design – resulting in severe and widespread timing problems. With the Flexras approach, only the changed modules are re-synthesized during each iteration – preserving the timing behavior of finished blocks.

Wasga does a global placement, and it tries to preserve the original design hierarchy as much as possible. This means that you’ll probably see some semblance of your original design in your multi-FPGA prototype – a convenience that was not always a given with other automatic partitioners. A global router then analyzes inter-FPGA connections, working to avoid partitioning-induced timing issues before they occur. The inter-FPGA timing is achieved by constraining the FPGA tools that are operating on each individual FPGA to conform to a global timing budget established by the partitioner.

Wasga technology is likely to end up being OEMed by selected prototyping board vendors, as well as being sold stand-alone. We also wouldn’t be surprised to see Wasga algorithms hiding inside some of your favorite emulators (for those of you with the budget to actually have a favorite emulator.)

It’s exciting to see this long-quiet segment of multi-FPGA design getting some new life, and with the incredible capability of some of today’s FPGAs, we may soon see some truly massive systems prototyped using Wasga. It will be fun to watch!

One thought on “Flexras Makes a Finer Cut”

  1. Flexras should bring some new energy into the automatic partitioning market. It’s about time, too. With the size of today’s FPGAs and the immense designs people are prototyping, manual partitioning is getting to be quite a pain. Would you try automatic partitioning?

Leave a Reply

featured blogs
Jun 23, 2021
Sr. VP of Engineering Jumana Muwafi explains the role of semiconductor IP development in electronic design automation & shares advice for women in leadership. The post Q&A with Jumana Muwafi, Sr. VP of Engineering: Pushing the Envelope on IP Innovation appeared fir...
Jun 23, 2021
PCB design complexities increase with the increase in the number of parts and layers in a design. For creating these complex designs with maximum efficiency, the design tool should be equipped with... [[ Click on the title to access the full blog on the Cadence Community sit...
Jun 23, 2021
Samtec presented a proof-of-concept demonstration of our new waveguide technology at IMS 2021 in Atlanta, Georgia. In this video, filmed at the show, Mike Dunne, Samtec’s Director of RF Business Development, gives us an update on the new technology and walks us through ...
Jun 21, 2021
By James Paris Last Saturday was my son's birthday and we had many things to… The post Time is money'¦so why waste it on bad data? appeared first on Design with Calibre....

featured video

Kyocera Super Resolution Printer with ARC EV Vision IP

Sponsored by Synopsys

See the amazing image processing features that Kyocera’s TASKalfa 3554ci brings to their customers.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

An FPGA-Based Solution for a Graph Neural Network Accelerator

Sponsored by Achronix

Graph Neural Networks (GNN) drive high demand for compute and memory performance and a software only based implementation of a GNN does not meet performance targets. As a result, there is an urgent need for hardware-based GNN acceleration. While traditional convolutional neural network (CNN) hardware acceleration has many solutions, the hardware acceleration of GNN has not been fully discussed and researched. This white paper reviews the latest GNN algorithms, the current status of acceleration technology research, and discusses FPGA-based GNN acceleration technology.

Click to read more

featured chalk talk

SN1000 SmartNIC

Sponsored by Xilinx

Cloud providers face a variety of challenges with moving data from one place to another. In modern data centers, flexibility is a key consideration - on par with performance. Software-defined hardware acceleration offers a major breakthrough in flexibility. In this episode of Chalk Talk, Amelia Dalton chats with Kartik Srinivasan of Xilinx about the details of Smart NICs with the new Alveo SN1000 with composable hardware.

Click here for more information about the Alveo SN1000 - The Composable SmartNIC