feature article
Subscribe Now

Duct Tape, FPGAs, and the Art of Making Great Multi-Purpose Tools

Most engineers will agree that duct tape is an excellent multi-purpose tool.  This wonder product has been used for everything from giving tennis balls the feel of a cricket ball, to saving the Apollo 13 mission from certain disaster.  Engineers love good multi-purpose tools because of the sheer versatility that they offer; a good multi-purpose tool can help a creative engineer get themselves out of a real bind.

To hardware designers, FPGAs are also excellent multi-purpose tools.  No other “off-the-shelf” semiconductor can become so many different things to different people.  The super-versatile logic cell architecture of the typical FPGA allows it to be used for everything from image enhancement on the Mars rover, to a life-saving patient heart monitor.  But FPGAs have been changing.

When FPGA technology first emerged, the concept was pretty simple: build an array of general-purpose logic cells that can be programmed to produce any possible logic configuration. This approach worked well for simple designs but was limited in its ability to handle more complicated designs. For example, many designs require large amounts of memory, and using only these general-purpose cells to create memory arrays is very inefficient. The designer was forced to use off-chip memories when using FPGAs in a design with large memory requirements, increasing BOM cost and PCB footprint.  Programmable logic device vendors responded to these changing customer requirements by introducing the special resource stripes that appear in most modern FPGA architectures.  Putting a column of purpose-built RAMs among the logic columns made these programmable devices practical for a much larger set of designs, allowing memory needs to be met on-chip.  Likewise, customers complained about poor Fmax performance when attempting to synthesize their multiplication-heavy DSP designs to an FPGA.  Many architectures now include purpose-built multipliers or DSP blocks that, besides being area efficient, can operate at a much faster frequency than a corresponding circuit built with fabric logic.  Given the rapid adoption of the dedicated resource stripes in current device offerings, future devices are sure to contain even more dedicated resource IP.

One of the essential tools in any FPGA designer’s belt is their FPGA synthesis tool.  On the whole, the heuristics used by synthesis tools do a great job of managing the trade-offs inherent in resource allocation. The tool can look for a mapping to achieve the best area savings.  For example, mapping larger RAM constructs within a design to the available block RAMs, and smaller RAM structures into fabric. Alternatively, timing can also be given consideration. In certain cases, logic on the critical path of a design is best implemented with the programmable logic cells; in other situations, only the use of the dedicated resources will allow the design to meet performance goals.

But while synthesis tools can use heuristics to determine the best implementation for most designs, they will never have all of the knowledge the design engineer possesses. In certain cases it could be possible for the designer to obtain superior results by guiding the synthesis tool with a specific resource assignment, using these dedicated resources as multi-purpose tools.  The problem has been that until recently, these dedicated resource IP blocks have not been terribly good multi-purpose tools.  They have had great potential to become so, given a bit of creativity.  For instance, a RAM block can sometimes be used to implement a shift-register, a DSP block can implement a counter or even a multiplexer.  While these might not be the most ideal uses for a dedicated resource, at least, according to traditional optimization heuristics, they could be advantageous when you’re in a pinch.  But the analysis and mapping control required to elicit good multi-purpose use of dedicated resources was lacking—until now.

Recent advances in programmable logic synthesis technology make it possible for designers to actually use these dedicated resources as effective multi-purpose tools.  Here’s how it works: when you bring your design into the synthesis environment, part of the compile process is a step whereby the synthesis tool examines your design looking for arithmetic and datapath “operators”, such as multipliers, counters, multiplexers, shift registers, memories, etc.

Before performing actual synthesis to the target technology, the designer can use a type of resource manager to examine what dedicated resources are available on the target device, and which operators are recognized by the synthesis tool as mappable to those dedicated resources.  For instance, when targeting a device with on-chip RAMs and DSP blocks, you would be given two views—one of all available RAM blocks on the chip along with all operators that could map to those RAM blocks, and another of all available DSP blocks on the chip along with all operators that could map to those DSP blocks.

Next, the designer can browse the various operators listed for each resource type, cross-probing to either the HDL source code or an RTL schematic, to familiarize themselves with where each operator is situated within the overall design, what is the size of operation (e.g. 8-bit or 18-bit multiplier) and what is the clock period constraining that operator.  The designer can also see a predictive summary count of all the dedicated resources forecast to be used by the current resource assignment.  So for example if the synthesis tool’s auto-assignment would result in 8 DSP blocks and 2 RAM blocks being used, the designer would have up-front knowledge of this before actually performing a full synthesis of the design.  Having advanced knowledge of any potential resource scarcity can alert the designer that they may find advantage in reviewing all assignments to that scarce resource, to ensure that the most appropriate operators are allocated to the dedicated resources.

Once the designer has reviewed operator assignments, they can make specific assignments of some or all of their operators to a particular dedicated resource, proceeding to have the synthesis tool perform a heuristic-based auto-assignment on the remaining operators during synthesis of the design to the target technology.  Voila!  At long last you have easy control over making once “use-it-or-lose-it” IP into great multi-purpose tools.

Most synthesis tools do a great job of finding the optimal use of the available device resources in mapping a design.  While you might not need to use this sort of resource management every day, it is certainly a valuable tool to have in your arsenal.  The creative freedom given to hardware designers by this sort of capability is quite impressive.  Now if only someone could find a way to make an actual FPGA out of duct tape…

20071030_mentor_fig1.jpg
Patent-applied-for resource management technology in Precision RTL Plus identifies available architectural blocks and assists in re-mapping implementations for the best performance and device utilization.
 

Leave a Reply

featured blogs
Nov 22, 2024
We're providing every session and keynote from Works With 2024 on-demand. It's the only place wireless IoT developers can access hands-on training for free....
Nov 22, 2024
I just saw a video on YouTube'”it's a few very funny minutes from a show by an engineer who transitioned into being a comedian...

featured video

Introducing FPGAi – Innovations Unlocked by AI-enabled FPGAs

Sponsored by Intel

Altera Innovators Day presentation by Ilya Ganusov showing the advantages of FPGAs for implementing AI-based Systems. See additional videos on AI and other Altera Innovators Day in Altera’s YouTube channel playlists.

Learn more about FPGAs for Artificial Intelligence here

featured paper

Quantized Neural Networks for FPGA Inference

Sponsored by Intel

Implementing a low precision network in FPGA hardware for efficient inferencing provides numerous advantages when it comes to meeting demanding specifications. The increased flexibility allows optimization of throughput, overall power consumption, resource usage, device size, TOPs/watt, and deterministic latency. These are important benefits where scaling and efficiency are inherent requirements of the application.

Click to read more

featured chalk talk

Shift Left Block/Chip Design with Calibre
In this episode of Chalk Talk, Amelia Dalton and David Abercrombie from Siemens EDA explore the multitude of benefits that shifting left with Calibre can bring to chip and block design. They investigate how Calibre can impact DRC verification, early design error debug, and optimize the configuration and management of multiple jobs for run time improvement.
Jun 18, 2024
41,454 views