feature article
Subscribe Now

Considerations for High-Bandwidth TCP/IP PowerPC Applications

The TCP/IP protocol suite is the de facto worldwide standard for communications over the Internet and almost all intranets. Interconnecting embedded devices is becoming standard practice even in device classes that were previously stand-alone entities. By its very definition, an embedded architecture has constrained resources, which is often at odds with rising application requirements.

Achieving wire-speed TCP/IP performance continues to be a significant engineering challenge, even for high-powered Intel™ Pentium™-class PCs. In this article, we’ll discuss the per-byte and per-packet overheads limiting TCP/IP performance and present the techniques utilized to maximize TCP/IP over Gigabit Ethernet performance in embedded processor-based applications.

Overview

Gigabit Ethernet performance is achieved by leveraging a multi-port DDR SDRAM memory controller to allocate memory bandwidth between embedded PowerPC™ processor local bus (PLB) interfaces and two data ports. Each data port is attached to a direct memory access (DMA) controller, allowing hardware peripherals high-bandwidth access to memory.

System Architecture

Memory bandwidth is an important consideration for high-performance network attached applications. Typically, external DDR memory is shared between the processor and one or more high-bandwidth peripherals such as Ethernet. A multi-ported memory controller efficiently divides the available memory bandwidth between processor interfaces and streaming peripherals, including Ethernet. The streaming peripherals are linked to memory through a point-to-point streaming interface via Direct Memory Access (DMA) controllers. The DMA controller implements a scatter-gather scheme whereby multiple buffers are converted to/from a contiguous stream on the Ethernet or other streaming peripheral. The Ethernet peripheral implements checksum offload on both the Transmit and Receive paths for optimal TCP performance. A block diagram of the system described above is shown in Figure 1.

Figure 1

TCP/IP Per-Byte Overhead

Per-byte overhead occurs when the processor touches payload data 1. The two most common operations of this type are buffer copies and TCP checksum calculation. Buffer copies represent a significant overhead for two reasons:

1. Most of the copies are unnecessary.

2. The processor is not an efficient data mover.

TCP checksum calculation is expensive, as it is calculated over each payload data byte. Embedded TCP/IP-enabled applications such as medical imaging require near wire speed TCP bandwidth to reliably transfer image data over a Gigabit Ethernet network. The data is generated from a high-resolution image source, not the processor. In this case, introducing a zero-copy software API and offloading the checksum calculation into FPGA fabric completely removes the per-byte overheads. “Zero-copy” is a term that describes a TCP software interface where no buffer copies occur. Linux and other operating systems have introduced software interfaces like sendfile() 2 that serve this purpose, and commercial standalone TCP/IP stack vendors like Treck TM offer similar zero-copy features. These software features allow the removal of buffer copies between the user application and the TCP/IP stack or operating system.

The scatter-gather and the checksum offload features of the system provide the hardware support necessary for zero-copy functionality. The scatter-gather feature is a flexibility of the DMA controller that allows software buffers to be located at any byte offset. This removes the need for the processor to copy unaligned or fragmented buffers.

Checksum offload is a feature of the Ethernet peripheral. It allows the TCP payload checksum to be calculated in FPGA fabric as Ethernet frames are transferred between main memory and the peripheral’s hardware FIFOs. These system features remove the need for costly buffer copies and processor checksum operations, leaving the processor to perform protocol operations and user functions.

TCP/IP Per-Packet Overhead

Per-packet overhead is associated with operations surrounding the transmission or reception of packets 1. Packet interrupts, hardware interfacing, and header processing are examples of per-packet overheads. Interrupt overhead represents a considerable burden on the processor and memory subsystem, especially when small packets are transferred. Interrupt coalescing is a technique used in such a system to alleviate some of this pressure by amortizing the interrupt overhead across multiple packets. The DMA engine waits until there are n frames to process before interrupting the processor, where n is a software-tunable value.

Transferring larger sized packets (jumbo frames of 9,000 bytes) has a similar effect by reducing the number of frames transmitted, and therefore the number of interrupts generated. This amortizes the per-packet overhead over a larger data payload.

Implementation

An example implementation of this architecture is the Gigabit System Reference Design from Xilinx (GSRD). It is geared toward high-performance bridging between TCP/IP-based protocols and user data interfaces like high-resolution image capture or Fibre Channel. The components of GSRD contain features to address the per-byte and per-packet overheads of a TCP/IP system. For applications requiring an embedded operating system, a MontaVista™ Linux™ port is available while a commercial standalone TCP/IP stack from Treck™ is available to satisfy applications with the highest bandwidth requirements.

The GSRD can provide Transmit TCP performance up to 890Mbps using jumbo frames, and is implemented in the latest FPGA technology available today from Xilinx. The GSRD can be downloaded today from http://www.xilinx.com/gsrd/.

Click here for printable PDF
(By clicking on this link you agree to FPGA Journal’s Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

References:

1. “End-System Optimizations for High-Speed TCP” (www.cs.duke.edu/ari/publications/end-system.pdf)

2. “Use sendfile to optimize data transfer” (http://builder.com.com/5100-6372-1044112.html)

Leave a Reply

featured blogs
May 21, 2022
May is Asian American and Pacific Islander (AAPI) Heritage Month. We would like to spotlight some of our incredible AAPI-identifying employees to celebrate. We recognize the important influence that... ...
May 20, 2022
I'm very happy with my new OMTech 40W CO2 laser engraver/cutter, but only because the folks from Makers Local 256 helped me get it up and running....
May 19, 2022
Learn about the AI chip design breakthroughs and case studies discussed at SNUG Silicon Valley 2022, including autonomous PPA optimization using DSO.ai. The post Key Highlights from SNUG 2022: AI Is Fast Forwarding Chip Design appeared first on From Silicon To Software....
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...

featured video

Intel® Agilex™ M-Series with HBM2e Technology

Sponsored by Intel

Intel expands the Intel® Agilex™ FPGA product offering with M-Series devices equipped with high fabric densities, in-package HBM2e memory, and DDR5 interfaces for high-memory bandwidth applications.

Learn more about the Intel® Agilex™ M-Series

featured paper

Intel Agilex FPGAs Deliver Game-Changing Flexibility & Agility for the Data-Centric World

Sponsored by Intel

The new Intel® Agilex™ FPGA is more than the latest programmable logic offering—it brings together revolutionary innovation in multiple areas of Intel technology leadership to create new opportunities to derive value and meaning from this transformation from edge to data center. Want to know more? Start with this white paper.

Click to read more

featured chalk talk

High Voltage Charging Solution for Energy Storage & Backup Systems

Sponsored by Mouser Electronics and Analog Devices

Today there is growing demand for energy storage with more power, longer range, and longer run time. But the question remains: how can we increase our energy storage given the energy storage mediums on the market today? In this episode of Chalk Talk, Amelia Dalton chats with Anthony Huyhn from Analog Devices about the benefits of high voltage energy storage, why stacked battery cells are crucial to these kinds of systems, how high voltage energy storage systems can reduce conduction loss exponentially and what kind of high voltage charging solutions from Analog Devices are on the market today.

Click here for more information about the Maxim Integrated MAX17703 Li-Ion Battery Charger Controller