feature article
Subscribe Now

Tools and Transceivers

Dual Xilinx Announcements

In the old days, we had only two kinds of FPGA – small and smaller, also known as slow and slower, or hot and hotter…  The technology was useful for a few high-value applications, but it was limited on all three fronts – density (cost), speed, and power — from attacking a much broader market.  As we waltzed along the to the three-count meter of Moore’s Law, all three critical parameters improved.  Density went up, frequency got faster, power cooled down, and people got happier.

After a few rounds, however, FPGAs had pretty well saturated the bigger, faster, cooler crowd.  In order to reach a broader audience now, we needed to teach our favorite semiconductors some new tricks.  For the DSP people, we designed hardened high-speed multipliers.  For connectivity hogs, we grafted on multi-gigabit serial transceivers.  For embedded designers, we dropped in a processor core or two – and all of those people needed big blocks of memory to make effective use of the new features. 

The addition of all those features, however, gave us the FPGA that was something akin to a semiconductor Swiss Army Knife.  It had the features required to do just about any job you could name.  Unfortunately, almost no project required all the features, so you ended up buying (and powering) a bunch of extra features that you didn’t need.  Ever use the toothpick feature on your pocketknife?  Nope.  Me neither.

Luckily, FPGA vendors didn’t like throwing away money any more than we did, so they came back with a variety of different FPGA families with different mixtures of the features we might require for different applications.  Beginning with their Virtex-4 generation, Xilinx divided their line into several segments.  Need a bunch of plain-old LUT fabric?  LX was for you – lots of LUTs and not much else.  Doing DSP?  The SX family had more multipliers and memory in the mix.  Piping big amounts of data around?  FX included gigabit serial transceivers and PowerPC processors so you could blast bits anywhere you wanted them at a furious pace. 

Now, with 65nm Virtex-5, we’ve got even more options to choose from.  LX is still there, but even gradeschool kids need SerDes these days, so Xilinx has added LXT and SXT – the Lotsa LUTS and DSP-dialed versions with a few SerDes transceivers thrown in.  All of these families contain the whole range of features, but each has a different mixture, making it better suited to particular classes of applications. 

Xilinx has just rolled out their newest, biggest, baddest member of the Virtex-5 generation, the FXT.  FXT has all the goodies and the kitchen sink.  Its distinguishing features are faster transceivers than the other “T” families and built-in hard-wired PowerPC cores.  New is the PowerPC 440, more robust RocketIO transceivers, and a whole buncha built-in memory. 

One of the most common complaints about FPGA-based processors is a lack of performance, so Xilinx worked to build in a significant performance boost with FXT.  The platform has up to two PPC 440 processor cores, each with 32KB instruction and 32KB data caches.  If you’re down with Dhrystones, Xilinx claims that each core can deliver up to 1100 DMIPS at 550MHz – considerably more oomph than previous-generation FPGA-based processors. 

One of the big advantages of building an embedded system-on-chip in an FPGA is the availability of incredible amounts of connectivity for stitching processor to peripherals and on-chip memory.  In that spirit, Xilinx has included a crossbar interconnect structure that gives simultaneous access to I/O and memory.  It includes dedicated master and slave processor local bus interfaces, four DMA ports, and a dedicated memory bus interface.  As before, Xilinx has included their Auxiliary Processor Control Unit (APU) for interfacing your processor environment with any of the wide variety of co-processors and accelerators you can construct using the FPGA fabric. 

Another featured attraction in FXT are the RocketIO GTX transceivers.  These are high-performance, relatively low power SerDes blocks that bring significantly more performance to the table than those in the LXT and SXT lines.  The additional performance brings us the ability to support standards like XAUI, Fibre Channel, SONET, Serial RapidIO, and of course the ubiquitous PCI Express.  Xilinx says the GTX transceivers consume less than 200mW typical at 6.5Gbps and contain features such as 4-tap DFE equalization on the receive end and pre-emphasis on the transmit side.  A multi-code physical coding sublayer also allows support of both 64B/66B and 64B/67B encoding/decoding schemes without resorting to the LUT fabric for implementation.

For you DSP dudes, FXT contains up to 384 DSP slices along with up to 16.5MB of on-chip memory.  The DSP48E slice includes MAC, multiply-add/subtract, tree-input add, and barrel shift, as well as wide counters and comparators.  The combination of the FPGA’s ability to perform massively parallel datapath operations with the high-speed conventional processing power of the dual PowerPCs and to connect the two together with a high-performance, low-latency fabric creates a daunting performer.  Mate this power with the ability to get data on and off chip via very fast multi-gigabit SerDes channels and you have the kind of device that… well, there’s the next question.  Exactly who needs this much FPGA?

One of the interesting trends with the segmentation of FPGA families is that the target domain for these full-boat devices actually narrows.  Since many applications are now well served by the LX, LXT, and SXT families, FXT is left as the purview of only those who need almost the entire gamut of features.  Today, this means the “old school” FPGA customers – people bolting big bit pipes to backplanes — as well as very-high-throughput signal processing applications like radar or surveillance video processing.

A new family would be no fun at all without new tools, and Xilinx is rolling out their new ISE 10.1 almost in lock-step with Virtex-5 FXT.  Don’t be confused into thinking that 10.1 is all about FXT support, however.  10.1 is a full-fledged release of Xilinx’s entire tool suite supporting their entire range of products.  In fact, Xilinx has brought together their previously somewhat asynchronous release of tools into a single, unified release with 10.1 – presumably making life much easier for customers trying to keep their tool chain consistent while using a variety of Xilinx products. 

10.1 includes the traditional ISE Foundation and PlanAhead tools for basic hardware design, the EDK and Platform Studio tools for embedded design with FPGAs, and a set of system-level design tools including System Generator for DSP, AccelDSP Synthesis, and Xilinx Platform Studio for stitching together embedded designs. 

Two major priorities with 10.1 were boosting runtime performance to decrease the cycle time for design turns and improving quality of results to boost the performance of the resulting designs.  Both goals were reached, in part, by taking advantage of multi-machine and multi-core processing to run parallel attempts at compilation.  SmartXplorer allows multiple tuning strategies to be explored on multiple processors, improving both overall turnaround time and resulting design performance. 

Xilinx also created a new “lite” version of their PlanAhead floorplanning tool that brings out the pin planning features of “PinAhead” to a broader audience.  PinAhead facilitates pin/package planning early in the process and performs design rule checks early in the design cycle.  Apparently, the pin planning functionality was well enough received that the company wanted to make it available to more than the usual “floorplanners” who spring for the optional PlanAhead – thus, we now have PlanAhead Lite.

As design has become more complex, finding the appropriate balance of constraints including area, delay, power, and tool runtime has become far more difficult.  Over the years, tools have sprouted a bevy of features, settings, and options that can be daunting for even the most experienced designers.  10.1 has a new “strategy-based implementation” that helps sort out the settings to give you your desired results.

Finally, the new 10.1 suite includes enhanced power analysis and optimization capabilities.  A “2nd generation” XPower Analyzer gives improved accuracy and sports a new user interface with multiple views of power consumption and thermal analysis and a view of power by voltage rail.  Xilinx claims that new power optimization delivers a 10% reduction in total power for Virtex-5 devices and a 12% reduction for Spartan-3A.

The 10.1 ISE tools are available immediately, and the Virtex-5 FXT series is now at the “shipping samples” stage for FX30T and FX70T devices, with FX100T, FX130T and FX200T scheduled for the next six months.  Production devices are slated for Q3 2008.  FX30T will list for US$159 in 1,000-unit volumes by the second half of 2009.  Xilinx will also make Virtex-5 FXT devices available through their EasyPath cost-reduction program for higher-volume applications.

Leave a Reply

featured blogs
Nov 30, 2023
No one wants to waste unnecessary time in the model creation phase when using a modeling software. Rather than expect users to spend time trawling for published data and tediously model equipment items one by one from scratch, modeling software tends to include pre-configured...
Nov 27, 2023
See how we're harnessing generative AI throughout our suite of EDA tools with Synopsys.AI Copilot, the world's first GenAI capability for chip design.The post Meet Synopsys.ai Copilot, Industry's First GenAI Capability for Chip Design appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

Power and Performance Analysis of FIR Filters and FFTs on Intel Agilex® 7 FPGAs

Sponsored by Intel

Learn about the Future of Intel Programmable Solutions Group at intel.com/leap. The power and performance efficiency of digital signal processing (DSP) workloads play a significant role in the evolution of modern-day technology. Compare benchmarks of finite impulse response (FIR) filters and fast Fourier transform (FFT) designs on Intel Agilex® 7 FPGAs to publicly available results from AMD’s Versal* FPGAs and artificial intelligence engines.

Read more

featured chalk talk

Automated Benchmark Tuning
Sponsored by Synopsys
Benchmarking is a great way to measure the performance of computing resources, but benchmark tuning can be a very complicated problem to solve. In this episode of Chalk Talk, Nozar Nozarian from Synopsys and Amelia Dalton investigate Synopsys’ Optimizer Studio that combines an evolution search algorithm with a powerful user interface that can help you quickly setup and run benchmarking experiments with much less effort and time than ever before.
Jan 26, 2023