feature article
Subscribe Now

The Declining Penalty of Programmability

Should We Bring Back “System Gates”?

In the early days of FPGAs, marketing was all bluster. In an effort to make their devices seem useful and, more importantly, bigger than the other guys’, device densities were given in terms of spectacularly optimistic “System Gates.” Just about the whole industry was complicit in this facade. Once Xilinx and Altera had gone after each other with the System Gate ruse, the other challengers really had no alternative but to fall in line.

What is a System Gate? (We hear you ask…)

Nobody really knows. Or, at least nobody who remembers is admitting it. The best we could determine at the time, System Gates were determined by taking the competitor’s similar device, looking at the datasheet, multiplying their System Gate total by 1.15, and issuing a press release claiming that their device was 15% larger.

OK, that might be a tad cynical. More realistically, we think they might have taken the total number of transistors on the chip, divided by the number of transistors that were thought to be needed to construct a typical logic gate, and using that as the System Gate total. The problem with this approach, of course, is that a very small percentage of those transistors were actually used to create something that could reasonably be called a “gate.” The majority of transistors on an FPGA were used for interconnect fabric which, according to ASIC designers at the time, had a “gate” value of zero. Furthermore, the logic on an FPGA was far less efficient than that on an ASIC, and most FPGAs could not handle anything near 100% utilization. Therefore, the number of useful, equivalent gates on an FPGA compared with a typical ASIC was dramatically lower than the System Gate count.

The best estimates in the hallway at the time were somewhere around 10:1. Many were even more pessimistic. So, an FPGA with a million System Gates might be equivalent to an ASIC with 100K logic gates – or less. Luckily (meaning much to the chagrin of those of us who make some of our living poking fun at FPGA marketing), the industry quietly moved to a more realistic estimate system based loosely around the concept of 4-input LUT equivalents. Note that this metric is not by any means free from the influence of marketing’s crafty hand, either. For Xilinx and Altera, at least, the number of “Logic Cells” and “Logic Elements,” respectively, do not correspond to any structures one would find on the chip. Both companies long ago abandoned the 4-input LUT for more efficient, wider, LUT-like structures. However, their Logic Cell and Logic Element counts are based on their estimate of the equivalent numbers of 4-input LUTS that would be required to implement the same logic as the number of actual, physical logic elements on their FPGAs. The marketing specsmanship here is based on reasoning such as: “We have a carry line that our competitor does not, and we estimate that in 1.9% of circumstances it will allow us to implement a function in 4 logic cells rather than 5, so we are going to claim a .04% higher ratio of actual cells to datasheet LUT4s than our competitor…” (percentages not to scale – for concept only – your mileage may vary – offer void where prohibited)

All of this marketing reform notwithstanding, the Penalty of Programmability has lived on in the minds of FPGA designers as an homage to “The Marketers Who Cried System Gates.” Ask your average FPGA user what the gate penalty is for using an FPGA instead of an ASIC, and you’re likely to hear a 10:1 number thrown around in there somewhere.

Today, however, that number is rooted more in legend and myth than in reality. In truth, FPGAs have grown much more efficient in their delivery of system- versus LUT-logic to the end application. During the golden years of the plague of System Gates, just about the whole FPGA was LUT fabric. There were a few memory blocks, a few DSP blocks (which were really just glorified hard-wired multipliers), a couple of clock generators, and a bunch of IO. In system design terms, the 10:1 penalty was real. 

Luckily, in those days, savvy designers were hip to the ways of FPGA marketing. After getting some really good humor mileage out of the FPGA datasheets and press releases, designers proceeded to use FPGAs for glue logic and integration functions that were comparatively small parts of what one would think of as a “system.” System Gates had no function in the real world. 

Now the tides have turned. Many applications today have an FPGA sitting right at the heart of the system. Whereas the FPGAs of the past were relegated to the role of glue logic, today’s FPGAs have the glue AND the substance. Year after year, more of the system functions got pulled into the FPGA. Today, processors, peripherals, memory, accelerators, DSP, system management, and yes, the glue that holds all those things together, can be found inside one bad-ass FPGA parked right in the middle of the board. Even analog functions are starting to slide from their well-defended fortifications into the clutches of the empire-building FPGA.

Today’s FPGAs can truly be called “Systems-on-Chip.” And, the 10x penalty no longer applies. Consider the enormous amount of hardened logic on today’s devices: High-performance processing subsystems (including things like multiple 1GHz ARM processors), common peripherals, memory, thousands of complex DSP blocks (not the old 18×18 multipliers of the past), highly sophisticated IO, including stunningly-fast multi-gigabit SerDes – the list goes on and on. For all of these functions, FPGAs have a penalty of zero. They are exactly as efficient, dense, and fast as the same function implemented on a typical high-end ASIC. In most cases, they end up being even MORE efficient because a very small fraction of the ASIC and custom chip designs are done on the same process node as FPGAs. FPGAs are shipping today at 28nm, and will soon be seen at 22nm and 20nm scales. Most ASIC and custom design is at least one and often two process nodes behind that.

The LUT fabric, too, has grown more efficient – due in no small part to a major evolution in the capabilities of FPGA design tools. Today’s synthesis and place-and-route engines can pack considerably more capability into the same logic as those of a decade ago. The result is that – even for the LUT fabric – the 10x penalty no longer applies. 

With an FPGA as the heart of the system, it turns out that a relatively small percentage of the system capability of the FPGA even comes from the LUTs. There is so much stuff packed into the hardened blocks that the LUT fabric is literally used only for the things that the hard-wired blocks can’t do, and that’s becoming fewer and fewer with every generation. If one compared the capability of an FPGA-based SoC with a typical SoC in terms of functional gates delivered, the race would be very close indeed. The primary thing that the FPGA brings to the party is flexibility – flexibility to do the things that the base SoC can’t quite do, or to adapt to changes in the field, or to handle emerging standards that haven’t quite standardized enough for hard-wired blocks to be proliferated. 

Today, we’ll risk saying that FPGA companies are actually doing themselves a disservice by publishing their density in LUT-4 equivalents. With the massive amount of hard-wired logic, the LUTs are an increasingly small portion of the picture. If one looked at the actual system capability of the FPGA, it would be much more impressive than the LUT counts that are being published.

As much as we thought we’d never say this, perhaps, ironically it is time:

Bring Back the System Gate!

2 thoughts on “The Declining Penalty of Programmability”

  1. Bring back System Gates!!

    Wait, did I just say that?

    FPGAs have made huge gains in relative density compared with other IC technologies. What do you think?

  2. System gates were a poor metric in the old days, and a worse one now. LUT counts are at least a good indication of the size of design that fits into the “fabric” of the FPGA. Counting the gates in the “features” just covers up the more useful metrics of programmability. The bottom line is that there is no single metric to describe a modern FPGA, so the only useful metrics are a complete feature matrix. Anything less is like comparing apples to oranges by counting the leaves on their respective trees.

Leave a Reply

featured blogs
May 6, 2021
For the third time*, I find myself writing the obligatory, "Hello, World!" first blog post. So "Hello" to all you readers of Cadence's Computational Fluid Dynamics blog. (Or... [[ Click on the title to access the full blog on the Cadence Community si...
May 6, 2021
Learn how correct-by-construction coding enables a more productive chip design process, as new code review tools address bugs early in the design process. The post Find Bugs Earlier Via On-the-Fly Code Checking for Productive Chip Design and Verification appeared first on Fr...
May 6, 2021
In April, we continued improvements to our new similar parts functionality as well as our search upgrades, and released several new content product pages on the website. We’ll also piggyback some of the cart upgrades we’ve rolled out in the last few months onto th...
May 4, 2021
What a difference a year can make! Oh, we're not referring to that virus that… The post Realize Live + U2U: Side by Side appeared first on Design with Calibre....

featured video

Introduction to EMI

Sponsored by Texas Instruments

Conducted versus radiated EMI. CISPR-25 and CISPR-32 standards. High-frequency or low-frequency emissions. Designing a system to reduce EMI can be overwhelming, but it doesn’t have to be. Watch this video to get an overview of EMI causes, standards, and mitigation techniques.

Click here for more information

featured paper

How to solve two screenless TV design challenges

Sponsored by Texas Instruments

The new 4K display chipsets from DLP Products help make screenless TV setup easier and save cost by reducing the number of components required while also adding more advanced image-processing capabilities. The DLP471TP DMD and DLPC6540 controller for small designs and the DLP471TE DMD and DLPC7540 controller for designs above 1,500 lumens help deliver stunning ultra-high resolution displays to the market and take advantage of the rapid expansion in the availability of 4K content.

Click here to read

featured chalk talk

Thermocouple Temperature Sensor Solution

Sponsored by Mouser Electronics and Microchip

When it comes to temperature monitoring and management, industrial applications can be extremely demanding. With temperatures that can range from 270 to 3000 C, consumer-grade temperature probes just don’t cut it. In this episode of Chalk Talk, Amelia Dalton chats with Ezana Haile of Microchip technology about using thermocouples for temperature monitoring in industrial applications.

More information about Microchip Technology MCP9600, MCP96L00, & MCP96RL00 Thermocouple ICs