feature article
Subscribe Now

Apples to Apples

Why Comparisons Can't be Simple

We’ve all had a fun time complaining.

It’s not like Marketing (upper-case “M”) was without blame, either.

When the new millennium dawned, we’d had enough of “System Gates” – the metric that FPGA companies used to describe and inflate the capacity of their devices.  In those days, vendors gave us a huge number – something like the number of transistors divided by three or four, as the “system gate” count of their devices.  Most of the transistors on an FPGA, however, are involved in configuration logic, routing, and other structures that don’t exist in a typical ASIC design.  Unfortunately, with their customers being engineers, the common practice quickly became “divide by 10 to get a realistic ASIC gate equivalent.” The bad side-effect of the common practice was the common notion that “FPGA Vendors are a bunch of liars.”

FPGA companies didn’t like that reputation, and yet they didn’t want to confuse their customers either.  It didn’t help anybody if an engineer read a claim of 3 million system gates for an FPGA only to discover that their 2-million gate ASIC design would require six of those FPGAs.  In a move to regain their credibility, FPGA companies began stating their capacities in terms of the actual number of look-up tables (LUTs) in the fabric.  For a (very) brief time we were all happy.  If we looked under a hypothetical microscope at an 80,000 LUT FPGA, we’d stand a good chance of counting 80,000 little LUT-like structures (if we didn’t mind the eye strain and we had a lot of patience and no real engineering work to do.)

Then Marketing stepped in.  “Hey, those OTHER guys have a few more LUTs than we do.  What can we put in the data sheet?” 

With the promise of beers on Marketing’s expense accounts, engineers racked their brains for a new algebra and, as engineers tend to do, came up with a clever solution. “Well, we think we have a more efficient carry-chain than those other guys; that should be worth 15% or so – let’s just increase our LUT count by 15% and call them “effective LUTs” (or a proprietary variation thereof).

This, of course, was the hype heard round the world (OK, not really.  Just a few of the more FPGA-obsessed nerds actually noticed at the time), and the battle of the exaggerated marketing claims resumed with renewed vigor and creativity.  Each company came up with a way to make their devices seem larger when compared to their rivals. 

Disputing the effective LUT count was only the beginning.  Don’t forget, you had to credit yourself for the amount of block-RAM included on your device.  After all, that’s part of the capability too. While we were on the subject of memory – why not double-count the registers on LUTs, both as part of the “maximum memory” available on the device and as part of the LUT fabric.  Then, we could say we had 100,000 LUTs and also “up to” 5Mb of RAM.  (extra marketing tip – always count your memory in bits, not bytes — the number is bigger.)  Other non-fabric IP needed to be counted in the total as well – if your FPGA had a few hundred hard-core 18X18 multipliers (or DSP blocks), you needed to count something for those, and what about those hard-core processors?  They certainly deserved a few extra “equivalent” LUTs.

Then, the further unthinkable happened.  We changed the LUT.

Altera probably started it.

Long ago.  University students got their PhDs doing studies on things like the optimal width for look-up tables in programmable logic designs.  They ran numerous experiments – taking hypothetical FPGA architectures populated with LUTs of varying widths (2-input, 3-input, 4-input, 5-input, etc.) and synthesizing all kinds of designs on them, measuring which gave the most efficient use of logic resources.  The 4-input LUT won the battle and became the industry standard.  It was accepted as fact that the 4-input look-up table was the optimum structure for programmable devices, and the industry infrastructure settled in on that idea for the long-haul.

Unfortunately, the long-haul in high-tech is never more than single-digit years.

As we got to smaller process geometries, gates got faster and routing got proportionally slower.  That means that the percentage of total delay from logic resources began to be overshadowed by the percentage of delay from routing resources.  When we had earlier looked for the optimal width of LUTs, it was with the assumption that logic was expensive and routing was either free or very, very cheap.  In this new world, however, it was routing that was expensive (in terms of delay) and logic that was cheap.  Now, connecting two LUTs together to realize a combinatorial function with more than 4 inputs was less attractive because it involved more routing.  This re-wrote the book on the optimal LUT width.

Altera came out with what they called an “adaptive logic module” which was really a wider (6-7 input) LUT-like structure.  Xilinx soon followed with their own announcement. (Although there is lingering debate about who was really first.  Anybody surprised?)

This clobbered our math.

Now, vendors didn’t want to put out datasheets with smaller numbers – trying to explain that these were actually smaller numbers of bigger objects.  Instead, they opted to come up with an “equivalent” number (since we were already into marketing-equivalence anyway).  Now, a constant coeffeicient was applied to the number of 6-7 input LUTs to approximate the number of 4-input LUTs it would take to do about the same thing.  Now, the formula for density was something like “Take the number of LUTs and multiply by some constant, add in some number for the amount of RAM, throw in a few for things like processors, DSP blocks, and I/O, and see how that number compares with our competitor.  If it’s smaller, bump the coefficients by 10% and try again.

Then came the benchmarks.

FPGA companies began to benchmark their devices on “customer” designs in order to see whether their devices were a higher or lower percentage utilization than their competitors with the same design.  Based on those results, press releases were sometimes issued stating that one company’s devices were larger than another company’s.  The battle got even uglier.

For engineers using FPGAs, however, the question is much simpler:  What FPGA is the best fit for MY design?

Usually, we build our design from a collection of pre-done IP, with a little bit of our own custom logic mixed in.  It would be nice if we knew how many UARTs, soft-core processors, RAM blocks, barrel shifters, FFTs, or whatever we could build with a given number of a vendor’s “equivalent logic elements” we needed.  Then, we could see what the capacity of these devices means for us – in real-world terms, for our project.

With FPGAs now available with dozens of mixtures of features – both hard-wired and programmable, it is impossible to come up with one density or capacity number that gives an accurate picture.  The best thing to do is to start from the IP blocks you need to use and work your way up.  Most IP datasheets tell the resources required for that block, and you can total those up with a fair level of confidence to find out what device will hold your design (with some room to spare, if you’re smart).  Based on that, you can pick the device that is best for your project rather than the vendor with the best marketing department. 

Don’t be expecting the datasheets to change any time soon.

Leave a Reply

featured blogs
Oct 26, 2020
Last week was the Linley Group's Fall Processor Conference. The conference opened, as usual, with Linley Gwenap's overview of the processor market (both silicon and IP). His opening keynote... [[ Click on the title to access the full blog on the Cadence Community s...
Oct 23, 2020
Processing a component onto a PCB used to be fairly straightforward. Through-hole products, or a single or double row surface mount with a larger centerline rarely offer unique challenges obtaining a proper solder joint. However, as electronics continue to get smaller and con...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...
Oct 23, 2020
Any suggestions for a 4x4 keypad in which the keys aren'€™t wobbly and you don'€™t have to strike a key dead center for it to make contact?...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

Fundamentals of Precision ADC Noise Analysis

Sponsored by Texas Instruments

Build your knowledge of noise performance with high-resolution delta-sigma ADCs. This e-book covers types of ADC noise, how other components contribute noise to the system, and how these noise sources interact with each other.

Click here to download the whitepaper

Featured Chalk Talk

ROHM BD71847AMWV PMIC for the NXP i.MM 8M Mini

Sponsored by Mouser Electronics and ROHM Semiconductor

Designing-in a power supply for today’s remarkable applications processors can be a hurdle for many embedded design teams. Creating a solutions that’s small, efficient, and inexpensive demands considerable engineering time and expertise. In this episode of Chalk Talk, Amelia Dalton chats with Kristopher Bahar of ROHM about some new power management ICs that are small, efficient, and inexpensive.

Click here for more information about ROHM Semiconductor BD71847AMWV Programmable Power Management IC