Next-Generation 65nm FPGAs

System Design Challenge: Bigger, Faster, Better

We are on the cusp of a major technology revolution today. All the buzzwords of yesteryear; digital convergence, triple play, etc. are changing from fantasy to reality.

Today’s consumers are now demanding that they have the ability to connect to the world via any media they choose – voice, web or video, from wherever they are – home, work, train, car, plane, mountain or jungle. People want to take pictures, shoot videos and send them over to friends by e-mail or share with the world over web sites like YouTube. They want to send instant messages, run office applications, watch videos or TV, access knowledge portals like Google or Yahoo and play games. Service providers have to meet this ever-increasing demand and the underlying network infrastructure must be able to support peak data rate demand from the users. As a result, system designers have to get new systems that are higher performance, cheaper, low power out to market quickly without increasing the risk.

65-nm FPGAs – The Ultimate Platform for System Integration
The system designer must look at the capabilities needed by the user and the constraints they have put on how the problem is solved. Designers have to perform cost/performance trade-offs to come up with the best possible architecture and ultimately decide the composition of the system. Obviously, the designer must choose between FPGAs, ASICs and standard products (ASSPs). While the advantages of FPGAs in flexibility, programmability, risk reduction and faster time to market are well known, 65-nanometer (nm) FPGAs such as the Virtex™-5 LXT from Xilinx are changing the game when it comes to integration, I/O, memory schemes and ultimately performance.

System Integration

The current trend in system design is toward higher levels of integration. Designers are reducing number of packages on the board to reduce design complexity, board area and layers as it directly translates to cost savings. From 2001 to today, the average number of packages on a board has gone down from 300 to 50*. Rather than having hundreds of discrete components perform various functions, designers are instead relying on one or two large parts along with as few as tens of additional components. Functions such as SerDes (high-speed serial/LVDS), coupling capacitors, termination resistors, I/O and memory controllers, clock managers and system monitors are now being integrated into the bigger part. For example, a system board might have just a single large FPGA instead of using separate digital signal processors (DSPs), digital logic and connectivity. Some of the new generation FPGAs are meeting this challenge head-on with embedded intellectual property (IP) such as processors, DSP blocks, SerDes, termination resistors, clock managers and PLLs.

The largest FPGA available today has about 330,000 logic cells (LC). While this number is larger in absolute terms when compared to older FPGAs, the architecture makes it even larger. Most of the advanced FPGAs today use a six-input look up table (LUT) instead of the traditional four-input LUT (See Figure 1: Implementing logic in a LUT6 vs. a LUT4). This makes the LC in 65-nm Virtex-5 FPGAs for example about 60 percent larger when compared to the previous 90-nanometer generation.

Figure 1: Implementing logic in LUT6 vs. LUT4 (image courtesy of Xilinx)

While the LC number is important and allows implementation of digital logic, FPGA vendors have started embedding hard IP blocks that allow designers to save significant resources from the gate count. These IP blocks include commonly-used I/O structures, protocol blocks, processing blocks, SerDes and clock managers.

By way of example, Figure 2 illustrates a 65-nm FPGA with a PCI Express Endpoint block and four tri-mode (10/100/1000 Mbps) Ethernet MACs. Implementing a 8-lane PCI Express Endpoint IP core in logic will consume approximately 15K LCs, and a 10/100/1000 Ethernet MAC will consume approximately 1600 LCs. Taking into account that on an average about 21 percent of logic gates in a FPGA designs are consumed by IP*, this 330K LC FPGA now looks larger by approximately 60K LCs.

Figure 2: 65-nm FPGA System Integration Capabilities (image courtesy of Xilinx)

Design Performance

Performance is almost always the most important consideration. You may have the largest chip, but if it runs at only 20 MHz, it is very likely not going make the cut for a majority of applications. The clock rates on most designs are running upwards of 200 MHz, and are forecasted to get even faster. In general, the faster clock rate is needed to gain competitive advantage or to achieve greater margin.

FPGA vendors use a number of techniques to improve design performance. Smaller process geometry at the 65-nm node improves overall device performance compared to 90-nm processes. The aforementioned six-input LUT when complemented with denser routing allows faster fabric performance. For example, a design implemented using six- input LUTs will use fewer LUTs and routing, effectively increasing performance by 50 percent as compared to four-input LUTs. Faster carry chains with more levels of look-ahead also boost MHz, and designs tools such as synthesis, mapping, and place and route can increase performance even more. Figure 3 illustrates a 30-percent increase in performance over previous generation FPGAs due to innovations in process technology, architecture and design tools.

Note: Based on comparison of 65-nm Virtex-5 vs. 90-nm Virtex-4 FGPAs

Figure 3: Performance Improvement (image courtesy of Xilinx)

System Interconnect Schemes

System interconnect has to match the rising internal clock speeds. Fortunately, designers now have several options to get the data in and out of the chip very quickly using either high-speed serial I/O transceivers or fast, wide busses, or perhaps a combination of both. Chip-to-chip interconnects such as memories still use very wide interfaces, but processors and DSPs are starting to use a combination of parallel as well as serial interconnects.

The board-to-board, backplanes and system-to-system interconnects are fast migrating to serial technologies. Protocols for carrying the data are also changing. Interfaces such as SPI-4.2, QDRII, DDR2, RapidIO are based on parallel I/O while PCI Express®, Serial RapidIO, Fibre Channel, SONET, HD-SDI and others use serial physical layers. The trend is very clear – there is a definite move to serial I/O.

The latest 65-nm FPGAs can act as the grand central station for connectivity within the system due to their remarkable I/O structures, both parallel (single and dual-ended) and serial.

Parallel I/O structures (See Table 1: Virtex-5 Parallel I/O Standards support)

Each I/O pin on the device supports more than 40 electrical and protocol standards
Each pin can be input and (3-state-able) output
Each pin can be individually configured for:
- ChipSync, XCITE termination, drive strength, input threshold, weak pull-up or -down
Each input can be 3.3-V tolerant, limited by its Vcco

Each I/O can have the same performance (Up to 700 Mbps single-ended & 1.25 Gbps differential LVDS)

LVCMOS (3.3v, 2.5v, 1.8v, 1.5v, and 1.2v)

LVDS, Bus LVDS, Extended LVDS

LCPECL

PCI, PCI-X

HyperTransport (LDT)

HSTL (1.8v, 1.5v, Classes I, II, III, IV)

HSTL_I_12 (unidirectional only)

DIFF_HSTL_I_18, DIFF_HSTL_I_18_DCI

DIFF_HSTL_I, DIFF_HSTL_I_DCI

RSDS_25 (point-to-point)

SSTL (2.5v, 1.8v, Classes I, II)

DIFF_SSTL_I

DIFF_SSTL2_I_DCI

DIFF_SSTL18_I, DIFF_SSTL18_I_DCI

GTL, GTL+

Table 1. Virtex-5 Parallel I/O Standards supported (courtesy of Xilinx)

Serial I/O structure (See Table 2: Virtex-5 Serial I/O Standards support)

Each SerDes support approximately 30 electrical and protocol standards
Serial I/O in all devices
Supports all major standards from 100 Mbps to 3.2 Gbps
Very low power with advanced features & capabilities
Easy to design using new tools, evaluation boards and debug resources

Table 2: Virtex-5 Serial I/O Standards supported (courtesy of Xilinx)

Comprehensive support for the parallel and serial I/O standards allows the FPGA to bridge pretty much any protocol to any other, serial or parallel, with few restrictions (See Figure 4: FPGA bridges between many different protocols). Data is brought to the optics using OC-48, and feeds the FPGA using an optoelectronic PHY device. The FPGA can then pre-process the data packets, send to a DDR2 SDRAM, bring it back inside the FPGA for more processing and send it to a network processor using either SPI-4.2 or Serial RapidIO. Alternatively, data can be brought in over a 4x PCI Express connector, processed and sent out using 4 HD-SDI links. There are a myriad of use models showcasing FPGAs as simply the best COTS platform for getting data in and out of systems at very high rates.

Figure 4: FPGA bridges between many different protocols (image courtesy of Xilinx)

Memory Interfaces

External memories are now a low-cost, commodity product with memory manufacturers today focused on speed, cost and smaller packaging. Within the system design context, there are several challenges at play for any chip that must communicate with an external DRAM,

Let’s consider that DRAM speed doubles every four years which shrinks the clock period, but the edge uncertainties remain (See Figure 5: Shrinking Data Valid Window with Rising Speeds). This combined with other generic PC board signal integrity problems makes timing a different proposition. Compounding this even further is the fact that many DRAM interface standards use different voltage levels and protocols.

Figure 5: Shrinking Data Valid Window with Rising Speeds (image courtesy of Xilinx)

In order to accommodate the rising throughput, DRAMs use very wide bussing schemes; DDR and DDR2 SDRAMs use 576 bit-wide busses while RLDRAM II uses a 648-bit wide bus. These wide busses add a huge amount of skew between data lines and clock, and this skew can differ from cycle to cycle. Add that each memory can use different types of training patterns, and it soon becomes clear that memory controller design is no longer all that easy.

Newer generation FPGAs go a long way in simplifying controller design, some offering memory interface features that cannot be found in any other chips. These include dedicated frequency division/multiplication and adaptive delay (64 taps @75ps each) circuitry in each I/O pin. The data coming in and going to the off-chip DRAM is very fast and frequency conversion on both input and output sides is needed. Adaptive delays in the input and output pins are used to center the read and write clocks within the valid data window. These circuits can also be used to align the data and clock dynamically so the designer never has to worry about signal skew. Better package design and native PLLs compensate for PC board induced signal integrity problems. Built-in ECC circuitry can be used with external memories to facilitate error correction without any design effort.

Summary

The triple play convergence is here to stay. Service providers are putting in new services to entice more users and that in turn creates new challenges for system designers. While traditional alternatives such as ASICs and ASSPs are still viable, FPGAs are fast becoming the go-to-chips for system integration. The latest generation devices offer a plethora of high end connectivity features and are the best of a new breed of 65-nm devices aimed at making life easier for system designers.

Notes:
* denotes data from EETimes EDA and FPGA Surveys from 2004-2006

About the Author
Navneet Rao is a technical marketing manager at Xilinx specializing in high speed connectivity solutions. Previously Rao led teams in architecting and designing transceivers and switch fabric ASICs at Mindspeed Technologies (Hotrail, Inc). Rao also worked in product development teams at Philips Semiconductors and LSI Logic. He is an active member in trade associations such as FSA, RapidIO, PCI Express, and HyperTransport. Rao has been an invited speaker at a number of seminars with industry experts including the RapidIO Trade Association and The Linley Group. Rao earned his degree from the Indian Institute of Technology, Kharagpur, India.

Next-Generation 65nm FPGAs

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk