Terminology Tango 101

Pay attention now, there will be a quiz.

In today’s lesson, we’re going to pick the best FPGA. Well, more accurately, we’re going to learn how to pick the best FPGA. The actual proof will be left as an exercise for the student. Since we’re engineers, we can’t rely on any touchy-feely stuff. It doesn’t matter who has the coolest name. We don’t even care who’s got the slickest icon printed on top of their BGA packages. We need a formula. Once we’ve got the right formula, we can read the data sheets, plug in all the numbers and… voilà! The prince of programmability will leap from the page into our collective consciousness.

To build our formula we’re going to need terms. (Any good formula has them…) Since we always talk about gates, we should definitely have them as one of our terms. Everyone knows that gates are good, and, being good, go on the top of our equation. When the number of gates gets bigger, our “goodness” metric will get bigger.

Next comes something else that always comes up, and that’s cost. Cost is bad. We don’t want too much of it, and less is always better. We’ll slide cost in on the bottom of our equation. Now we’re getting somewhere! We’ve got both a numerator and a denominator and our magic metric is starting to take shape.

Gates and money aren’t everything, though. If they were, we’d just go out and buy ourselves a bunch of DRAM chips. There’s another thing we need in our FPGAs, and we want lots of it. That thing is speed. Since speed is another good thing, we’ll merge those megahertz right in next to the gates on top of our equation. We’re really getting multi-dimensional now! Our metric has made it as far as “gate-megahertz per dollar.”

That’s far too simplistic for the likes of us, though. We didn’t just roll out of the fab this morning, did we? We know another thing or two – like power, for example. In FPGAs it’s not a good thing. We may like it in our cars, microwave ovens, and stereo amplifiers, but we don’t appreciate it too much in our FPGAs. Power burns our batteries, heats our heatsinks, and preys on our power supplies. Power is another contender for the denominator.

There are a few more things we probably should think about, but it’s not too clear how to handle them in our formula yet. Pins, for example, are a problem. Mostly, pins are good. We like ‘em. We don’t, however, want to go piling them on indefinitely. They’re like sprinkles on ice cream. You need just the right number, and then extras don’t help you out too much. There are also things like security, which don’t have good metrics associated with them. The best security metric I’ve seen is some measure of how many reverse-engineers, dollars, and time it might take to extract your brilliant design from an FPGA in the field. (Do you go to reverse-engineering school to learn how to do that?) Those aren’t the sort of numbers one usually finds on a datasheet. There’s also radiation tolerance, which is vitally important if you plan to be operating where there’s a lot of radiation, but not all that interesting otherwise.

Maybe for now, we should just be content with our nice, simple, symmetric four-poster equation with gates and megahertz on the top, dollars and watts on the bottom. Let’s roll up our sleeves, take this baby out on the road and see how it performs. Nothing left to do but plug in a few numbers, right?

We’ll start with the gates. Zipping over to our friends at Altera, we see that they’ve got a paper out explaining how they’ve done away with gate counts. Trying to foil our plan, eh? We do, however have a convenient metric provided for us. Buried right in the part number are some digits that tell us something about density, (which hopefully correlates to gates.) Altera says that, in a part number like “EP2A15” the “15” represents “16,640.” Ok, wait. What kind of math is this? When the Miami Arena, (seating capacity, coincidentally, 16,640) has a sell-out crowd and someone asks how many people were at the game, they’re not likely to reply “15” are they? Nonetheless, we’ll just mentally log our 1109.33 multiplication factor, and now we’re all set to convert from Altera part numbers to… what – “Logic Elements?” It looks like we’re not ready for gates yet. Altera wasn’t kidding.

Let’s go look at Xilinx, maybe they understand gates. In their part number they embed something called “System Gates,” but what, exactly, are those? Evidently a device that has 8 million of them has 104,832 “Logic Cells” so does that means each “Logic Cell” has about 75 system gates? Let’s write down that number and check back later.

According to Altera’s APEX 20KE data (I know, we’re back to picking on Altera, but we’ll get to Xilinx in a minute. Be patient.) a “Logic Element” contains one 4-input LUT plus a flip-flop plus some carry/cascade logic. Now that’s progress. We’ve heard of flip-flops and have some idea how they might map to gates, right? Even if we didn’t, the vendor is here to help. They tell us that a plain, vanilla DFF is made up of 7 gates, whereas a super-deluxe DFF with clr, preset, and clk enable is on sale for 8 gates. The problem is, the folks over at Xilinx claim it takes 12 gates to make a DFF with reset and clk enable. That’s a 50% discrepancy right off the top, and we’re still just setting the stage.

Maybe the problem is a “gate” which Altera claims is a 2 input AND, while Xilinx prefers the 2 input NAND. I always liked the NAND too, myself. It’s much more versatile owing to the free inverter. Altera is not always the lowball, though. If you’re talking about 4-input XORs (both vendors agree this is a “complex” gate) Altera counts ‘em as 13 gates but Xilinx figures them at only 9. We’re darn close to that 50% discrepancy again, only this time in the other direction. Maybe we should pull ourselves up out of the trees here and look at the forest for awhile before we get too confused.

For one thing, we still haven’t addressed the question of the LUT. Luckily, both of these vendors pack one of them with a flip-flop and some extra logic into their basic cell. Altera’s is called a “Logic Element” (LE) while Xilinx calls it a “Logic Cell” (LC.) Are LEs and LCs equal? Yes. No. According to Altera’s “Comparing Altera APEX 20KE & Xilinx Virtex-E Logic Densities” document, exactly the same number of each are used to implement a variety of 5-input and 6-input functions, which might lead one to believe that LCs ad LEs are equal. They even go on to show that, after some adjusting of the number of LCs on a Xilinx device, the number of LEs and LCs being equal indicates that the “logic density resources…are in fact equal…” Case closed, right? LEs equal LCs? Well, hold on there skippy, we have a dissenting opinion.

According to Altera’s subsequent “Logic Structure Comparison Between Stratix II and Virtex-Based Architectures,” the NEW Stratix II logic element, Altera’s ALUT (My, we are opening such a can of worms here, aren’t we?) is bigger than both an LE and an LC. This would be all fine-and-dandy if it were bigger than both by the same amount, but no such luck. According to this document, an (Altera) LE is worth 0.8 ALUTs, but a (Xilinx) LC is worth only 0.65. That makes an LE 23.08% bigger than an LC in my book. In all fairness (and this article is really all about being fair, isn’t it?) the document also says that these numbers are based on a variety of factors including the effectiveness of tools, the structure of the design, and other things we really don’t want to go into here. Since we’re already ignoring 50% differences, we can probably let this little 23% slide as well.

So, how many gates ARE in a Logic Element/Cell? Well, it depends. You see, we shouldn’t just think about these little guys one at a time. They come in packs. According to Altera, a ten-pack of LEs is called a “Logic Array Block” or LAB (we’re back on Apex 20KE again). Over in Xilinx land, LCs are very tricky. There seem to be two of them per slice, and there are four slices in a CLB. Two times four is eight, right? Watch carefully now, my fingers never leave my hands… Presto! A CLB contains 9 LCs. Wow! That’s cool. Where did the extra LC come from? Hold onto that thought for a minute…

In the old days (XC4000 series), Xilinx claimed that each “CLB” (then containing only 2 Logic Cells) were worth 15 to 48 logic gates. That would make each of the 2 CLBs worth 7.5 to 24 gates, right? Looking back the Altera APEX 20K LE data, an LE is worth from 8 to 21 gates, but that’s based on the dramatically cheaper 8-gate DFF with all the goodies. Adding 4 back in for Xilinx’s different definition of a flip-flop gets us back to almost parity.

If an LE and a LC are both worth 8 to 24 gates, and we pick a nice, middle-of-the-road 16 for math, we could go back to our 75 system gates per LC and deduce that a “system gate” is worth about 4.5 regular, garden-variety gates. This would be wrong, however. “System Gates” presumably also take into account the goodies like memories and multipliers that we’ll discuss and ignore later.

What happens when you step out of the Xilinx/Altera arena? Well, Lattice Semiconductor plays by pretty much the same rules, but Actel’s ProASIC and ProASIC Plus use a “fine-grained” architecture that really doesn’t have anything that looks like a LUT. Their gates are evidently a third different flavor. Caveat Countor!

Unfortunately, all is not random logic. There’s also memory, I/O, and a host of special functions that are sometimes hiding on the chip. These don’t use the regular logic blocks. When you start to factor these elements in, it becomes clear we need a new approach. Maybe Altera’s right. Gate counting just doesn’t work anymore.

REAL ADVICE: There’s no good way to tell which device your design will fit on other than running synthesis and place-and-route. Any other method will give you results that are far from trustworthy.

With the gate question all wrapped up with a bow, we can move confidently on to the question of price. This one’s pretty easy. Pricing is clearly specified by the vendors based on things like “production volume of 250K” at a time in the future, say, 18 months from now. Wait, I see a question in the back of the room. “How many FPGA designs have a production volume of 250K or more?” Well, according to our 2003 market study, a little less than 0.5%. Will the other 99.5% of you take a break while we work this one out? Now, how many of the remaining ones don’t need your parts for 18 months…?

We seem to be playing to an empty house here, so maybe we should skip the price discussion and move on to power. Bring it on! How many watts? We’re ready to read the datasheet and write it down. Well, we really should be getting smarter by now, shouldn’t we? QuickLogic was very helpful a few weeks ago in pointing out that power too, comes in many shapes and sizes. Quiescent power is what your device uses when it’s not doing anything useful. It’s just sitting there in its recliner, watching a little FPGA TV and waiting for something important to come along while it sips (or gulps) down the coulombs along with a little avocado dip. This kind of power is actually pretty important in FPGAs, because they use a lot of it. All the transistors that make the programmable logic programmable have to be fed all the time, whether the device is on the job or not. At 90nm with its leakage current issues, quiescent power is even more of a contributor to the overall power budget.

Next comes dynamic power. Lucky for our formula (this is sarcasm,) dynamic power is pretty much impossible to estimate accurately. It varies from design to design, changes based on stimulus, and varies significantly with switching frequency. There is really no good metric that measures how much dynamic power an FPGA will use in a particular situation, or even relatively compare different FPGAs and families. Yee Haw!

Lest we think that we’ve covered the power problem thoroughly, there’s also a significant issue with power spikes at configuration time. Since SRAM FPGAs require a configuration bit stream to be blasted in at startup, there’s a significant power/current spike right out of the starting blocks. If you’re designing power distribution systems for your FPGA, you’d better take that into account.

Well, we’re almost out of time, and we still haven’t covered performance. From register-to-register delays and levels of logic, to I/O data rates, it’s a complicated subject. We’ll just have to leave that one as part of your homework. For tomorrow, please choose five popular FPGA families and analyze them with our new formula, proving which one is the “goodest.” Send us an e-mail with your answer. Class dismissed!