The Declining Penalty of Programmability

In the early days of FPGAs, marketing was all bluster. In an effort to make their devices seem useful and, more importantly, bigger than the other guys’, device densities were given in terms of spectacularly optimistic “System Gates.” Just about the whole industry was complicit in this facade. Once Xilinx and Altera had gone after each other with the System Gate ruse, the other challengers really had no alternative but to fall in line.

What is a System Gate? (We hear you ask…)

Nobody really knows. Or, at least nobody who remembers is admitting it. The best we could determine at the time, System Gates were determined by taking the competitor’s similar device, looking at the datasheet, multiplying their System Gate total by 1.15, and issuing a press release claiming that their device was 15% larger.

OK, that might be a tad cynical. More realistically, we think they might have taken the total number of transistors on the chip, divided by the number of transistors that were thought to be needed to construct a typical logic gate, and using that as the System Gate total. The problem with this approach, of course, is that a very small percentage of those transistors were actually used to create something that could reasonably be called a “gate.” The majority of transistors on an FPGA were used for interconnect fabric which, according to ASIC designers at the time, had a “gate” value of zero. Furthermore, the logic on an FPGA was far less efficient than that on an ASIC, and most FPGAs could not handle anything near 100% utilization. Therefore, the number of useful, equivalent gates on an FPGA compared with a typical ASIC was dramatically lower than the System Gate count.

The best estimates in the hallway at the time were somewhere around 10:1. Many were even more pessimistic. So, an FPGA with a million System Gates might be equivalent to an ASIC with 100K logic gates – or less. Luckily (meaning much to the chagrin of those of us who make some of our living poking fun at FPGA marketing), the industry quietly moved to a more realistic estimate system based loosely around the concept of 4-input LUT equivalents. Note that this metric is not by any means free from the influence of marketing’s crafty hand, either. For Xilinx and Altera, at least, the number of “Logic Cells” and “Logic Elements,” respectively, do not correspond to any structures one would find on the chip. Both companies long ago abandoned the 4-input LUT for more efficient, wider, LUT-like structures. However, their Logic Cell and Logic Element counts are based on their estimate of the equivalent numbers of 4-input LUTS that would be required to implement the same logic as the number of actual, physical logic elements on their FPGAs. The marketing specsmanship here is based on reasoning such as: “We have a carry line that our competitor does not, and we estimate that in 1.9% of circumstances it will allow us to implement a function in 4 logic cells rather than 5, so we are going to claim a .04% higher ratio of actual cells to datasheet LUT4s than our competitor…” (percentages not to scale – for concept only – your mileage may vary – offer void where prohibited)

All of this marketing reform notwithstanding, the Penalty of Programmability has lived on in the minds of FPGA designers as an homage to “The Marketers Who Cried System Gates.” Ask your average FPGA user what the gate penalty is for using an FPGA instead of an ASIC, and you’re likely to hear a 10:1 number thrown around in there somewhere.

Today, however, that number is rooted more in legend and myth than in reality. In truth, FPGAs have grown much more efficient in their delivery of system- versus LUT-logic to the end application. During the golden years of the plague of System Gates, just about the whole FPGA was LUT fabric. There were a few memory blocks, a few DSP blocks (which were really just glorified hard-wired multipliers), a couple of clock generators, and a bunch of IO. In system design terms, the 10:1 penalty was real.

Luckily, in those days, savvy designers were hip to the ways of FPGA marketing. After getting some really good humor mileage out of the FPGA datasheets and press releases, designers proceeded to use FPGAs for glue logic and integration functions that were comparatively small parts of what one would think of as a “system.” System Gates had no function in the real world.

Now the tides have turned. Many applications today have an FPGA sitting right at the heart of the system. Whereas the FPGAs of the past were relegated to the role of glue logic, today’s FPGAs have the glue AND the substance. Year after year, more of the system functions got pulled into the FPGA. Today, processors, peripherals, memory, accelerators, DSP, system management, and yes, the glue that holds all those things together, can be found inside one bad-ass FPGA parked right in the middle of the board. Even analog functions are starting to slide from their well-defended fortifications into the clutches of the empire-building FPGA.

Today’s FPGAs can truly be called “Systems-on-Chip.” And, the 10x penalty no longer applies. Consider the enormous amount of hardened logic on today’s devices: High-performance processing subsystems (including things like multiple 1GHz ARM processors), common peripherals, memory, thousands of complex DSP blocks (not the old 18×18 multipliers of the past), highly sophisticated IO, including stunningly-fast multi-gigabit SerDes – the list goes on and on. For all of these functions, FPGAs have a penalty of zero. They are exactly as efficient, dense, and fast as the same function implemented on a typical high-end ASIC. In most cases, they end up being even MORE efficient because a very small fraction of the ASIC and custom chip designs are done on the same process node as FPGAs. FPGAs are shipping today at 28nm, and will soon be seen at 22nm and 20nm scales. Most ASIC and custom design is at least one and often two process nodes behind that.

The LUT fabric, too, has grown more efficient – due in no small part to a major evolution in the capabilities of FPGA design tools. Today’s synthesis and place-and-route engines can pack considerably more capability into the same logic as those of a decade ago. The result is that – even for the LUT fabric – the 10x penalty no longer applies.

With an FPGA as the heart of the system, it turns out that a relatively small percentage of the system capability of the FPGA even comes from the LUTs. There is so much stuff packed into the hardened blocks that the LUT fabric is literally used only for the things that the hard-wired blocks can’t do, and that’s becoming fewer and fewer with every generation. If one compared the capability of an FPGA-based SoC with a typical SoC in terms of functional gates delivered, the race would be very close indeed. The primary thing that the FPGA brings to the party is flexibility – flexibility to do the things that the base SoC can’t quite do, or to adapt to changes in the field, or to handle emerging standards that haven’t quite standardized enough for hard-wired blocks to be proliferated.

Today, we’ll risk saying that FPGA companies are actually doing themselves a disservice by publishing their density in LUT-4 equivalents. With the massive amount of hard-wired logic, the LUTs are an increasingly small portion of the picture. If one looked at the actual system capability of the FPGA, it would be much more impressive than the LUT counts that are being published.

As much as we thought we’d never say this, perhaps, ironically it is time:

Bring Back the System Gate!

2 thoughts on “The Declining Penalty of Programmability”

kevin says:

August 28, 2012 at 9:12 am

Bring back System Gates!!

Wait, did I just say that?

FPGAs have made huge gains in relative density compared with other IC technologies. What do you think?

Log in to Reply
gabor@alacron.com says:

September 4, 2012 at 1:49 pm

System gates were a poor metric in the old days, and a worse one now. LUT counts are at least a good indication of the size of design that fits into the “fabric” of the FPGA. Counting the gates in the “features” just covers up the more useful metrics of programmability. The bottom line is that there is no single metric to describe a modern FPGA, so the only useful metrics are a complete feature matrix. Anything less is like comparing apples to oranges by counting the leaves on their respective trees.

Log in to Reply

The Declining Penalty of Programmability

Related

2 thoughts on “The Declining Penalty of Programmability”

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk