feature article
Subscribe Now

Duct Tape, FPGAs, and the Art of Making Great Multi-Purpose Tools

Most engineers will agree that duct tape is an excellent multi-purpose tool.  This wonder product has been used for everything from giving tennis balls the feel of a cricket ball, to saving the Apollo 13 mission from certain disaster.  Engineers love good multi-purpose tools because of the sheer versatility that they offer; a good multi-purpose tool can help a creative engineer get themselves out of a real bind.

To hardware designers, FPGAs are also excellent multi-purpose tools.  No other “off-the-shelf” semiconductor can become so many different things to different people.  The super-versatile logic cell architecture of the typical FPGA allows it to be used for everything from image enhancement on the Mars rover, to a life-saving patient heart monitor.  But FPGAs have been changing.

When FPGA technology first emerged, the concept was pretty simple: build an array of general-purpose logic cells that can be programmed to produce any possible logic configuration. This approach worked well for simple designs but was limited in its ability to handle more complicated designs. For example, many designs require large amounts of memory, and using only these general-purpose cells to create memory arrays is very inefficient. The designer was forced to use off-chip memories when using FPGAs in a design with large memory requirements, increasing BOM cost and PCB footprint.  Programmable logic device vendors responded to these changing customer requirements by introducing the special resource stripes that appear in most modern FPGA architectures.  Putting a column of purpose-built RAMs among the logic columns made these programmable devices practical for a much larger set of designs, allowing memory needs to be met on-chip.  Likewise, customers complained about poor Fmax performance when attempting to synthesize their multiplication-heavy DSP designs to an FPGA.  Many architectures now include purpose-built multipliers or DSP blocks that, besides being area efficient, can operate at a much faster frequency than a corresponding circuit built with fabric logic.  Given the rapid adoption of the dedicated resource stripes in current device offerings, future devices are sure to contain even more dedicated resource IP.

One of the essential tools in any FPGA designer’s belt is their FPGA synthesis tool.  On the whole, the heuristics used by synthesis tools do a great job of managing the trade-offs inherent in resource allocation. The tool can look for a mapping to achieve the best area savings.  For example, mapping larger RAM constructs within a design to the available block RAMs, and smaller RAM structures into fabric. Alternatively, timing can also be given consideration. In certain cases, logic on the critical path of a design is best implemented with the programmable logic cells; in other situations, only the use of the dedicated resources will allow the design to meet performance goals.

But while synthesis tools can use heuristics to determine the best implementation for most designs, they will never have all of the knowledge the design engineer possesses. In certain cases it could be possible for the designer to obtain superior results by guiding the synthesis tool with a specific resource assignment, using these dedicated resources as multi-purpose tools.  The problem has been that until recently, these dedicated resource IP blocks have not been terribly good multi-purpose tools.  They have had great potential to become so, given a bit of creativity.  For instance, a RAM block can sometimes be used to implement a shift-register, a DSP block can implement a counter or even a multiplexer.  While these might not be the most ideal uses for a dedicated resource, at least, according to traditional optimization heuristics, they could be advantageous when you’re in a pinch.  But the analysis and mapping control required to elicit good multi-purpose use of dedicated resources was lacking—until now.

Recent advances in programmable logic synthesis technology make it possible for designers to actually use these dedicated resources as effective multi-purpose tools.  Here’s how it works: when you bring your design into the synthesis environment, part of the compile process is a step whereby the synthesis tool examines your design looking for arithmetic and datapath “operators”, such as multipliers, counters, multiplexers, shift registers, memories, etc.

Before performing actual synthesis to the target technology, the designer can use a type of resource manager to examine what dedicated resources are available on the target device, and which operators are recognized by the synthesis tool as mappable to those dedicated resources.  For instance, when targeting a device with on-chip RAMs and DSP blocks, you would be given two views—one of all available RAM blocks on the chip along with all operators that could map to those RAM blocks, and another of all available DSP blocks on the chip along with all operators that could map to those DSP blocks.

Next, the designer can browse the various operators listed for each resource type, cross-probing to either the HDL source code or an RTL schematic, to familiarize themselves with where each operator is situated within the overall design, what is the size of operation (e.g. 8-bit or 18-bit multiplier) and what is the clock period constraining that operator.  The designer can also see a predictive summary count of all the dedicated resources forecast to be used by the current resource assignment.  So for example if the synthesis tool’s auto-assignment would result in 8 DSP blocks and 2 RAM blocks being used, the designer would have up-front knowledge of this before actually performing a full synthesis of the design.  Having advanced knowledge of any potential resource scarcity can alert the designer that they may find advantage in reviewing all assignments to that scarce resource, to ensure that the most appropriate operators are allocated to the dedicated resources.

Once the designer has reviewed operator assignments, they can make specific assignments of some or all of their operators to a particular dedicated resource, proceeding to have the synthesis tool perform a heuristic-based auto-assignment on the remaining operators during synthesis of the design to the target technology.  Voila!  At long last you have easy control over making once “use-it-or-lose-it” IP into great multi-purpose tools.

Most synthesis tools do a great job of finding the optimal use of the available device resources in mapping a design.  While you might not need to use this sort of resource management every day, it is certainly a valuable tool to have in your arsenal.  The creative freedom given to hardware designers by this sort of capability is quite impressive.  Now if only someone could find a way to make an actual FPGA out of duct tape…

20071030_mentor_fig1.jpg
Patent-applied-for resource management technology in Precision RTL Plus identifies available architectural blocks and assists in re-mapping implementations for the best performance and device utilization.
 

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

AI SoC Chats: Protecting Data with Security IP

Sponsored by Synopsys

Understand the threat profiles and security trends for AI SoC applications, including how laws and regulations are changing to protect the private information and data of users. Secure boot, secure debug, and secure communication for neural network engines is critical. Learn how DesignWare Security IP and Hardware Root of Trust can help designers create a secure enclave on the SoC and update software remotely.

Click here for more information about Security IP

featured paper

Overcoming PPA and Productivity Challenges of New Age ICs with Mixed Placement Innovation

Sponsored by Cadence Design Systems

With the increase in the number of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan using manual methods, directly impacting tapeout schedules and power, performance, and area (PPA). In this white paper, learn how a breakthrough technology addresses design productivity along with design quality improvements for macro-dominated designs. Download white paper.

Click here to download the whitepaper

Featured Chalk Talk

Bulk Acoustic Wave (BAW) Technology

Sponsored by Mouser Electronics and Texas Instruments

In industrial applications, crystals are not ideal for generating clock signal timing. They take up valuable PCB real-estate, and aren’t stable in harsh thermal and vibration environments. In this episode of Chalk Talk, Amelia Dalton chats with Nick Smith from Texas Instruments about bulk acoustic wave (BAW) technology that offers an attractive alternative to crystals.

More information about Texas Instruments Bulk Acoustic Wave (BAW) Technology