feature article
Subscribe Now

Bit-Based Dynamic Alignment for Multi-Gigabit Parallel I/O

The Role of High-Speed Parallel I/O

As I/O standards continue to evolve toward serialization, high-speed parallel I/O still plays an important role in specific chip-to-chip applications in which either current serial technologies are cost prohibitive, or legacy demands it.

FPGAs are being used increasingly as programmable SoCs, designed-in as an integral part of the system data path supporting NPU, framer and module- based source synchronous I/O standards such as SPI 4.2, SFI 4.1, XGMII, HyperTransport and Rapid IO. However, these applications require devices capable of performing high-speed I/O translation and processing. How can this level of performance be achieved in an FPGA array?

Although electrical compliance and high-speed signal integrity are required features, these alone do not address the bandwidth issue. The FPGA I/O also must have circuitry to manage and maintain the clock and data relationships of these high- speed signals, as well as provide the gearbox functionality necessary to support the transfer of the high speed I/O data to the FPGA fabric.

This article examines how emerging bit-based dynamic alignment logic has become a critical part of overall system level I/O architecture. For example, this logic is integrated and embedded into every I/O block of LatticeSC FPGAs. As a result, the devices are capable of speeds up to 2Gbps per pin.

Dynamic Alignment and Data Transfer I/O Logic

In addition to the need for I/O buffers to achieve increasing levels of electrical performance, today’s high-speed source synchronous interfaces also present three other challenges for the designer:

1) Managing the data to data skew (word alignment)
2) Managing and maintaining the clock to data relationship
3) Clock domain transfer of these high-speed signals to the FPGA fabric

The data to data relationship (word alignment and deskew) portion is fairly straightforward and can be handled by FPGA logic. However, the delay sensitive clock to data relationship and clock domain transfers are more challenging.
For bit and bus deskew, designers traditionally have relied on methods such as matching bus trace lengths, or on PLLs and DLLs to manipulate the clock signal, eliminating clock injection delay and/or phase shifting the clock some pre-determined percentage of the clock cycle in order to maximize the clock to data relationship. While helpful, these approaches are not sufficient at higher speeds, because their compensation is clock-based and applied globally to all bus signals. These methods also have shortcomings in their compensation, because they are static and don’t account for the delay variations that can occur over process, voltage and temperature. Today’s high-speed interfaces require bit-based compensation due to the increased difficulty of meeting and maintaining adequate setup and hold time margins for shrinking clock cycle times. This issue is exacerbated for high-speed parallel protocols, such as SPI4.2, in which dynamic bit-based alignment and word alignment are key elements of the total system solution.

Figure 1 – Parallel Bus Skew and the Effects of Dynamic Alignment

While PLLs and DLLs can be used to align data and clock, the simplest way to address applications in which the clock-to-data relationship is known is by utilizing an input delay block. For this purpose, the I/O logic block provides the user with a 144-tap (40ps step size, typical) delay block that can be used independently in two dynamic alignment modes.

Bus-based Dynamic Alignment

In this configuration, the input delay block is under DLL control to provide bus-based alignment capability for data rates up to ~600Mbps. This mode (Figure 2) preserves a fixed clock/data phase relationship by aligning the incoming clock and data bus under DLL control. Another advantage of this mode is that it automatically tracks/compensates for delay variations due to process, voltage and temperature.

Figure 2 – Bus-based Dynamic Alignment Mode

Although the bus-based DLL control mode is useful for some applications in which the clock to data relationship is known, it has inherent limitations when it comes to dynamic clock to data compensation for high-speed, source synchronous interfaces. This is because the delay compensation in these modes is applied globally to all bits of the data bus, not allowing for the bit-based accuracy needed in applications above 600Mbps. A different approach must be taken for applications that require source synchronous interfaces running >600Mbps, such as SPI4.2.

Bit-based Dynamic Alignment with Closed-Loop Control Circuitry

For higher speed interfaces, a closed-loop monitor and control circuit is needed that dynamically maintains proper setup and hold time margins on a bit-by-bit basis. For this reason, an alignment mode is required in which the input delay block is used in conjunction with a closed-loop monitor and control circuit embedded in each I/O block, as is the case with the LatticeSC FPGA. This is the only way to ensure proper data sampling in high-speed applications where the clock to data relationship is unknown. Another key advantage of this mode is that it handles process, voltage and temperature compensation not only on the receiving FGPA device, but also the variations on the driving device.

This is the most robust configuration (Figure 3) available in which the user can establish and dynamically maintain the clock to data relationship on a bit by bit basis, providing the resolution necessary to support speeds of up to 2Gbps on a single pin.

Figure 3 – Bit-based Dynamic Alignment Mode

Again, the key to this mode is the embedded closed-loop monitor and control circuitry that can be enabled/disabled or updated under FPGA control. This closed-loop design also allows for tracking and compensating for delay variations due to process, voltage and temperature conditions. Here is an example (Figure 4) that shows how the bit-based dynamic alignment circuitry actually works. The SPI 4.2 protocol is used as a reference because it is a popular, high-speed source synchronous interface that requires dynamic alignment.

Figure 4 – Specified Data Valid Window for Bit-based Dynamic Alignment (AIL)

As seen in Figure 4, the user specifies a data valid window around the clock edge, which establishes a setup and hold time margin in which no transitions should occur. Because this is a closed-loop system, once these settings are made and the window established, the closed-loop circuit will continuously monitor and control the clock-to-data relationship of each bit to ensure no data transitions occur. All that is then needed is a GUI-based tool with which the designer can enter the user-defined data valid window. Figure 5 shows the benefit of the bit-based dynamic alignment circuit and the GUI used to configure the circuit.

Figure 5 – GUI Tool to Establish User-defined Data Valid Window

Gearbox Logic for High-speed Data and Clock Domain Transfer

Due to the high speed of these interfaces, gearbox logic must be utilized to slow these signals to manageable speeds for the FPGA fabric. As shown in Figure 6, the FPGA I/O block provides this gearbox logic for either SDR or DDR interfaces.

Figure 6 – FPGA I/O Block with Embedded Gearbox Logic

On-die clock dividers, in which both the divided and non-divided outputs are phase matched, also are provided to support the clocking requirements of the gearbox logic, alleviating the need to use generic PLL/DLL resources.

Table 1 shows an example of the gearbox functionality. Another feature of the gearing logic is to provide the proper domain transfer of the high-speed edge clock to the lower-speed FPGA system clock, guaranteed across process, voltage and temperature conditions. Although an input example is shown, the gearing logic is available for outputs as well.

Table 1 – Example of Gearing for an 8-bit Bus

Note: x1 gearing is used to ensure the guaranteed
transfer of the high-speed edge clock to the FPGA system clock

Clock Domain Transfers

Finally, the high-speed data must be handed off to the FPGA fabric for further processing. I/O block circuitry is needed to ensure the proper transfer of the I/O data from the high-speed edge clocks to the lower-speed FPGA fabric clocks. To accommodate this clock domain transfer, the SDR and DDR elements have two clock inputs, one for the edge clock and one to clock data on to the FPGA fabric clock. This approach guarantees error free transfers over process, voltage and temperature variations.


All FPGA I/O are not created equal. Parallel I/O interfaces remain an important part of system-level design/data transfer and demand an FPGA architecture that delivers multi-gigabit I/O performance. Conventional alignment techniques are not robust enough to achieve and maintain this high-end performance. I/O logic is needed that deals seamlessly with dynamic clock/data alignment on an individual bit basis, with clocking and gearing resources that manage the processing and transfer of the high-speed signals to the FPGA fabric.


About the author: Ron Warner is the European Strategic Marketing Manager for Lattice Semiconductor. Previously he was an Applications Engineering Manager at Lucent Technologies/Agere Systems and spent 10 years as an FPGA and software design engineer at Westinghouse Electric and Harris Corporation. He received his BSEE from Youngstown State University in Ohio in 1982.

Leave a Reply

featured blogs
Nov 25, 2020
It constantly amazes me how there are always multiple ways of doing things. The problem is that sometimes it'€™s hard to decide which option is best....
Nov 25, 2020
[From the last episode: We looked at what it takes to generate data that can be used to train machine-learning .] We take a break from learning how IoT technology works for one of our occasional posts on how IoT technology is used. In this case, we look at trucking fleet mana...
Nov 25, 2020
It might seem simple, but database units and accuracy directly relate to the artwork generated, and it is possible to misunderstand the artwork format as it relates to the board setup. Thirty years... [[ Click on the title to access the full blog on the Cadence Community sit...
Nov 23, 2020
Readers of the Samtec blog know we are always talking about next-gen speed. Current channels rates are running at 56 Gbps PAM4. However, system designers are starting to look at 112 Gbps PAM4 data rates. Intuition would say that bleeding edge data rates like 112 Gbps PAM4 onl...

featured video

AI SoC Chats: Scaling AI Systems with Die-to-Die Interfaces

Sponsored by Synopsys

Join Synopsys Interface IP expert Manmeet Walia to understand the trends around scaling AI SoCs and systems while minimizing latency and power by using die-to-die interfaces.

Click here for more information about DesignWare IP for Amazing AI

Featured paper

Top 9 design questions about digital isolators

Sponsored by Texas Instruments

Looking for more information about digital isolators? We’re here to help. Based on TI E2E™ support forum feedback, we compiled a list of the most frequently asked questions about digital isolator design challenges. This article covers questions such as, “What is the logic state of a digital isolator with no input signal?”, and “Can you leave unused channel pins on a digital isolator floating?”

Click here to download the whitepaper

Featured Chalk Talk

Introducing Google Coral

Sponsored by Mouser Electronics and Google

AI inference at the edge is exploding right now. Numerous designs that can’t use cloud processing for AI tasks need high-performance, low-power AI acceleration right in their embedded designs. Wouldn’t it be cool if those designs could have their own little Google TPU? In this episode of Chalk Talk, Amelia Dalton chats with James McKurkin of Google about the Google Coral edge TPU.

More information about Coral System on Module