Apples to Apples

We’ve all had a fun time complaining.

It’s not like Marketing (upper-case “M”) was without blame, either.

When the new millennium dawned, we’d had enough of “System Gates” – the metric that FPGA companies used to describe and inflate the capacity of their devices. In those days, vendors gave us a huge number – something like the number of transistors divided by three or four, as the “system gate” count of their devices. Most of the transistors on an FPGA, however, are involved in configuration logic, routing, and other structures that don’t exist in a typical ASIC design. Unfortunately, with their customers being engineers, the common practice quickly became “divide by 10 to get a realistic ASIC gate equivalent.” The bad side-effect of the common practice was the common notion that “FPGA Vendors are a bunch of liars.”

FPGA companies didn’t like that reputation, and yet they didn’t want to confuse their customers either. It didn’t help anybody if an engineer read a claim of 3 million system gates for an FPGA only to discover that their 2-million gate ASIC design would require six of those FPGAs. In a move to regain their credibility, FPGA companies began stating their capacities in terms of the actual number of look-up tables (LUTs) in the fabric. For a (very) brief time we were all happy. If we looked under a hypothetical microscope at an 80,000 LUT FPGA, we’d stand a good chance of counting 80,000 little LUT-like structures (if we didn’t mind the eye strain and we had a lot of patience and no real engineering work to do.)

Then Marketing stepped in. “Hey, those OTHER guys have a few more LUTs than we do. What can we put in the data sheet?”

With the promise of beers on Marketing’s expense accounts, engineers racked their brains for a new algebra and, as engineers tend to do, came up with a clever solution. “Well, we think we have a more efficient carry-chain than those other guys; that should be worth 15% or so – let’s just increase our LUT count by 15% and call them “effective LUTs” (or a proprietary variation thereof).

This, of course, was the hype heard round the world (OK, not really. Just a few of the more FPGA-obsessed nerds actually noticed at the time), and the battle of the exaggerated marketing claims resumed with renewed vigor and creativity. Each company came up with a way to make their devices seem larger when compared to their rivals.

Disputing the effective LUT count was only the beginning. Don’t forget, you had to credit yourself for the amount of block-RAM included on your device. After all, that’s part of the capability too. While we were on the subject of memory – why not double-count the registers on LUTs, both as part of the “maximum memory” available on the device and as part of the LUT fabric. Then, we could say we had 100,000 LUTs and also “up to” 5Mb of RAM. (extra marketing tip – always count your memory in bits, not bytes — the number is bigger.) Other non-fabric IP needed to be counted in the total as well – if your FPGA had a few hundred hard-core 18X18 multipliers (or DSP blocks), you needed to count something for those, and what about those hard-core processors? They certainly deserved a few extra “equivalent” LUTs.

Then, the further unthinkable happened. We changed the LUT.

Altera probably started it.

Long ago. University students got their PhDs doing studies on things like the optimal width for look-up tables in programmable logic designs. They ran numerous experiments – taking hypothetical FPGA architectures populated with LUTs of varying widths (2-input, 3-input, 4-input, 5-input, etc.) and synthesizing all kinds of designs on them, measuring which gave the most efficient use of logic resources. The 4-input LUT won the battle and became the industry standard. It was accepted as fact that the 4-input look-up table was the optimum structure for programmable devices, and the industry infrastructure settled in on that idea for the long-haul.

Unfortunately, the long-haul in high-tech is never more than single-digit years.

As we got to smaller process geometries, gates got faster and routing got proportionally slower. That means that the percentage of total delay from logic resources began to be overshadowed by the percentage of delay from routing resources. When we had earlier looked for the optimal width of LUTs, it was with the assumption that logic was expensive and routing was either free or very, very cheap. In this new world, however, it was routing that was expensive (in terms of delay) and logic that was cheap. Now, connecting two LUTs together to realize a combinatorial function with more than 4 inputs was less attractive because it involved more routing. This re-wrote the book on the optimal LUT width.

Altera came out with what they called an “adaptive logic module” which was really a wider (6-7 input) LUT-like structure. Xilinx soon followed with their own announcement. (Although there is lingering debate about who was really first. Anybody surprised?)

This clobbered our math.

Now, vendors didn’t want to put out datasheets with smaller numbers – trying to explain that these were actually smaller numbers of bigger objects. Instead, they opted to come up with an “equivalent” number (since we were already into marketing-equivalence anyway). Now, a constant coeffeicient was applied to the number of 6-7 input LUTs to approximate the number of 4-input LUTs it would take to do about the same thing. Now, the formula for density was something like “Take the number of LUTs and multiply by some constant, add in some number for the amount of RAM, throw in a few for things like processors, DSP blocks, and I/O, and see how that number compares with our competitor. If it’s smaller, bump the coefficients by 10% and try again.

Then came the benchmarks.

FPGA companies began to benchmark their devices on “customer” designs in order to see whether their devices were a higher or lower percentage utilization than their competitors with the same design. Based on those results, press releases were sometimes issued stating that one company’s devices were larger than another company’s. The battle got even uglier.

For engineers using FPGAs, however, the question is much simpler: What FPGA is the best fit for MY design?

Usually, we build our design from a collection of pre-done IP, with a little bit of our own custom logic mixed in. It would be nice if we knew how many UARTs, soft-core processors, RAM blocks, barrel shifters, FFTs, or whatever we could build with a given number of a vendor’s “equivalent logic elements” we needed. Then, we could see what the capacity of these devices means for us – in real-world terms, for our project.

With FPGAs now available with dozens of mixtures of features – both hard-wired and programmable, it is impossible to come up with one density or capacity number that gives an accurate picture. The best thing to do is to start from the IP blocks you need to use and work your way up. Most IP datasheets tell the resources required for that block, and you can total those up with a fair level of confidence to find out what device will hold your design (with some room to spare, if you’re smart). Based on that, you can pick the device that is best for your project rather than the vendor with the best marketing department.

Don’t be expecting the datasheets to change any time soon.

Apples to Apples

Related

Leave a Reply Cancel reply

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

featured chalk talk