feature article
Subscribe Now

The FPGA is Half Full

Unwinding the Marketing Spin

Let’s say you are looking for a new house for your family. You’ve got a couple of contenders. One has four bedrooms, three baths, a two-car garage, and 3,000 square feet of living area. The other has three bedrooms, three baths, a three-car garage, and 3,200 square feet of living area.

Lining the two data sheets up, the houses are comparable. One shows a bit more living area, the other has an additional bedroom (which you would just use for a guest room anyway), and the additional garage isn’t much of a factor, since your family owns only two cars.

Weighing the two choices based on the data sheet makes sense – until you start reading the fine print. House #1, it turns out, doesn’t actually have 3,000 square feet. To get that number, they included a section of the yard that is covered by a roof, and the square footage number is “effective square feet.” Another footnote says that they have estimated the effective square feet based on a “livability factor,” since they deem the living space to be extra-efficient.

Reading further on house #2, there is a footnote saying that the heating system will support only two of the bedrooms being occupied at any one time. And – one of the bathrooms actually contains a bed, so it is counted as both a bedroom and a bathroom in the info sheet.

Welcome to the wonderful world of FPGA product tables.

When you shop for an FPGA for your project, you’ll see that the FPGA companies generously provide product selectors that tell you what resources are available on their chips. The problem is the details that are hidden fine print – and the ones that are not in print at all. Let’s start with the capacity of the FPGA itself. One family boasts “up to 480,000 logic cells.” OK, cool. Drill down to the fine print and the answer changes to “up to 478,000 logic cells.” Drilling down yet another level, we are told the number of logic cells is actually 477,760. Well, that’s just rounding up, right? And, it’s less than 1% difference, so why be picky?

But, those 478,000 cells – absolutely do not exist. Looking over one column, we see that the device physically contains 74,650 “slices.” Dropping to the footnotes, we see that a slice is made up of four LUTs and eight flip-flops. Multiplying 74,650 slices times four LUTs we get – 298,600 actual LUTs. Whoa! OK, that’s not just rounding. How do we turn 298,600 LUTs into 480,000? Well, back in the (very) old days, FPGAs used four-input LUTs. Newer ones use something like six-input LUTs. So – if we (generously) scale the number of six input LUTs to an equivalent number of legacy four-input ones, we’d still get only 450,000 – and that’s assuming that we get a perfect utilization of the extra inputs. The plot thickens…

Now let’s say you want to try to use those LUTs. This may come as a shock, but you can never use 100% of the LUTs on your FPGA. Typically, the routing resources won’t support completely routing anywhere near that number. If you’re clocking them very fast, you’ll also bump into power limitations. In fact, many designers tell us that they don’t get more than 60%-70% utilization in practice. So if we took the favorable 70% number, we’re looking at around 210K actual usable physical LUTs – on a device marketed as 480K.

That’s 210K unless, of course, you want to use some for “distributed RAM.” You see, when they’re trying to pump up the memory stats, they allow that you might want to use some of the LUT fabric to make memory instead of LUTs. You can have the LUTs or the RAM, but not both at the same time. 

Life is more than LUTs and RAM, though. Today’s FPGAs have a wealth of other resources included. Take DSP blocks, for example. You’ll see some pretty impressive GMAC numbers given for FPGAs used in digital signal processing. Unfortunately, most of those numbers are idealized figures that you’d never see in real life. For example, if an FPGA boasts 1000 DSP blocks (where each DSP block contains one or more hard-wired multipliers and some accumulator/arithmetic and carry circuitry) they typically calculate the published GMAC number by multiplying the number of multipliers by the maximum operating frequency of those multipliers. If you manage to craft a real, useful design that comes even close to that situation, a lot of people would love to talk to you about engineering employment opportunities.

How about IO, though? The vendors are always bragging about their huge bandwidth of SerDes. You’ll see large numbers of transceivers capable of blistering-fast speeds (up to 28Gbps each on the current 28nm generation of devices). The thing is, with all that data coming into the chip, you need to be able to do something useful with it. That means you need lots of fast internal resources like LUTs, memory, and DSP blocks. In many of today’s devices, the SerDes bandwidth exceeds what the rest of the FPGA is capable of, for anything but the most well-behaved, straightforward designs. All that SerDes looks good on paper, but if you can’t use them all, they’re just taking up expensive silicon area, increasing your cost, and leaking power. 

Chatting with a number of designers, we hear that under-utilizing FPGAs is pretty much an industry norm. If you’ve been using FPGAs for a while, you tend to mostly ignore the datasheet numbers and plan your design based on experience and preliminary output from the tools. If the tools say you can route your design, take advantage of the resources you need, and hit your power budget, then you can feel pretty comfortable with your selection of devices.

But, why have all those resources there in the first place if you can’t use them?

Well, first there are bragging rights and the reality of competition between the vendors. If one vendor has a million-cell FPGA, the other one needs to have 1.1 million. Specsmanship is an important part of marketing. Also, because of the wide variation in designs, each design may leave a different set of resources on the table. One design may max out the DSP blocks but not need all the LUT fabric. Another may be limited by the amount of RAM. Many are at the mercy of total IO pins or bandwidth. FPGA companies spend an amazing amount of engineering just trying to find the right balance of resources that will best serve the widest possible audience.

One area that has long been an architectural Achilles’ heel, however, is that of routing resources. Putting more routing on the chip is expensive. If you design an FPGA with so much routing resource that you can always route 100%, you’ve wasted a tremendous amount of space. Balancing the available routing with the other resources requires exhaustive trial-and-error with a large number and variety of designs. FPGA companies typically iterate with their proposed architecture through a huge test suite, adjusting the balance of resources each time until they hit a point where they get acceptable utilization on a diverse set of realistic designs. Xilinx has announced that their upcoming family includes a major rework of routing resources – aimed at letting us hit much higher utilization numbers than with previous families.

Certainly, the language and norms on FPGA specifications have become distorted over the years. Simply having the capacity defined in terms of an anachronistic architecture as a pseudo industry standard is confusing enough. Add to that the reality that almost no design will be able to come close to a perfect, balanced utilization of the resources on any given FPGA, and the situation can be downright confusing. There seems to be hope, however, in the direction the FPGA companies are taking, both with their design and with their marketing messages. It would be wonderful to be out of the era of marketing-driven specsmanship and into a new age of useful metrics for choosing the best part for our design work. 

Until that day, the only strategy is to use the tools to get realistic fit estimates. Your designs, with your constraints, are the best (and only) sure-fire models that will tell you whether you can succeed with a particular device.

8 thoughts on “The FPGA is Half Full”

  1. The use of bloated marketing numbers to define the FPGA size is nothing new. And in fact the “logic cell” number is a better yardstick for most designs than the old “system gates” number. At least I can find a multiplier that gets me from logic cells to LUTs. Still you’re right about needing to run the design through the tools to get the final picture. Often it’s gotchas like clock routing (yeah you get 32 global clock buffers but only 16 can reach any section of a chip) or other shared routing resources (Oh, you wanted to attach two adjacent clock pins to two PLLs, or have two adjacent I/Os running DDR on different clocks?). It’s been a long time since I was able to rely on a data sheet to tell me everything I needed to know about programmable logic.

  2. Pingback: 123movies
  3. Pingback: Youjizz
  4. Pingback: coehuman Diyala

Leave a Reply

featured blogs
Nov 24, 2020
In our last Knowledge Booster Blog , we introduced you to some tips and tricks for the optimal use of the Virtuoso ADE Product Suite . W e are now happy to present you with some further news from our... [[ Click on the title to access the full blog on the Cadence Community s...
Nov 23, 2020
It'€™s been a long time since I performed Karnaugh map minimizations by hand. As a result, on my first pass, I missed a couple of obvious optimizations....
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...
Nov 20, 2020
[From the last episode: We looked at neuromorphic machine learning, which is intended to act more like the brain does.] Our last topic to cover on learning (ML) is about training. We talked about supervised learning, which means we'€™re training a model based on a bunch of ...

featured video

Introduction to the fundamental technologies of power density

Sponsored by Texas Instruments

The need for power density is clear, but what are the critical components that enable higher power density? In this overview video, we will provide a deeper understanding of the fundamental principles of high-power-density designs, and demonstrate how partnering with TI, and our advanced technological capabilities can help improve your efforts to achieve those high-power-density figures.

featured paper

Overcoming PPA and Productivity Challenges of New Age ICs with Mixed Placement Innovation

Sponsored by Cadence Design Systems

With the increase in the number of on-chip storage elements, it has become extremely time consuming to come up with an optimized floorplan using manual methods, directly impacting tapeout schedules and power, performance, and area (PPA). In this white paper, learn how a breakthrough technology addresses design productivity along with design quality improvements for macro-dominated designs. Download white paper.

Click here to download the whitepaper

featured chalk talk

AC Protection & Motor Control in HVAC Systems

Sponsored by Mouser Electronics and Littelfuse

The design of HVAC systems poses unique challenges for things like motor control and circuit protection. System performance and reliability are critical, and those come in part from choosing the right components for the job. In this episode of Chalk Talk, Amelia Dalton chats with Ryan Sheahen of Littelfuse about choosing the right components for your next HVAC design.

Click here for more information about Littelfuse AC Protection & Motor Control in HVAC Solutions