feature article
Subscribe Now

Considerations for High-Bandwidth TCP/IP PowerPC Applications

The TCP/IP protocol suite is the de facto worldwide standard for communications over the Internet and almost all intranets. Interconnecting embedded devices is becoming standard practice even in device classes that were previously stand-alone entities. By its very definition, an embedded architecture has constrained resources, which is often at odds with rising application requirements.

Achieving wire-speed TCP/IP performance continues to be a significant engineering challenge, even for high-powered Intel™ Pentium™-class PCs. In this article, we’ll discuss the per-byte and per-packet overheads limiting TCP/IP performance and present the techniques utilized to maximize TCP/IP over Gigabit Ethernet performance in embedded processor-based applications.

Overview

Gigabit Ethernet performance is achieved by leveraging a multi-port DDR SDRAM memory controller to allocate memory bandwidth between embedded PowerPC™ processor local bus (PLB) interfaces and two data ports. Each data port is attached to a direct memory access (DMA) controller, allowing hardware peripherals high-bandwidth access to memory.

System Architecture

Memory bandwidth is an important consideration for high-performance network attached applications. Typically, external DDR memory is shared between the processor and one or more high-bandwidth peripherals such as Ethernet. A multi-ported memory controller efficiently divides the available memory bandwidth between processor interfaces and streaming peripherals, including Ethernet. The streaming peripherals are linked to memory through a point-to-point streaming interface via Direct Memory Access (DMA) controllers. The DMA controller implements a scatter-gather scheme whereby multiple buffers are converted to/from a contiguous stream on the Ethernet or other streaming peripheral. The Ethernet peripheral implements checksum offload on both the Transmit and Receive paths for optimal TCP performance. A block diagram of the system described above is shown in Figure 1.

Figure 1

TCP/IP Per-Byte Overhead

Per-byte overhead occurs when the processor touches payload data 1. The two most common operations of this type are buffer copies and TCP checksum calculation. Buffer copies represent a significant overhead for two reasons:

1. Most of the copies are unnecessary.

2. The processor is not an efficient data mover.

TCP checksum calculation is expensive, as it is calculated over each payload data byte. Embedded TCP/IP-enabled applications such as medical imaging require near wire speed TCP bandwidth to reliably transfer image data over a Gigabit Ethernet network. The data is generated from a high-resolution image source, not the processor. In this case, introducing a zero-copy software API and offloading the checksum calculation into FPGA fabric completely removes the per-byte overheads. “Zero-copy” is a term that describes a TCP software interface where no buffer copies occur. Linux and other operating systems have introduced software interfaces like sendfile() 2 that serve this purpose, and commercial standalone TCP/IP stack vendors like Treck TM offer similar zero-copy features. These software features allow the removal of buffer copies between the user application and the TCP/IP stack or operating system.

The scatter-gather and the checksum offload features of the system provide the hardware support necessary for zero-copy functionality. The scatter-gather feature is a flexibility of the DMA controller that allows software buffers to be located at any byte offset. This removes the need for the processor to copy unaligned or fragmented buffers.

Checksum offload is a feature of the Ethernet peripheral. It allows the TCP payload checksum to be calculated in FPGA fabric as Ethernet frames are transferred between main memory and the peripheral’s hardware FIFOs. These system features remove the need for costly buffer copies and processor checksum operations, leaving the processor to perform protocol operations and user functions.

TCP/IP Per-Packet Overhead

Per-packet overhead is associated with operations surrounding the transmission or reception of packets 1. Packet interrupts, hardware interfacing, and header processing are examples of per-packet overheads. Interrupt overhead represents a considerable burden on the processor and memory subsystem, especially when small packets are transferred. Interrupt coalescing is a technique used in such a system to alleviate some of this pressure by amortizing the interrupt overhead across multiple packets. The DMA engine waits until there are n frames to process before interrupting the processor, where n is a software-tunable value.

Transferring larger sized packets (jumbo frames of 9,000 bytes) has a similar effect by reducing the number of frames transmitted, and therefore the number of interrupts generated. This amortizes the per-packet overhead over a larger data payload.

Implementation

An example implementation of this architecture is the Gigabit System Reference Design from Xilinx (GSRD). It is geared toward high-performance bridging between TCP/IP-based protocols and user data interfaces like high-resolution image capture or Fibre Channel. The components of GSRD contain features to address the per-byte and per-packet overheads of a TCP/IP system. For applications requiring an embedded operating system, a MontaVista™ Linux™ port is available while a commercial standalone TCP/IP stack from Treck™ is available to satisfy applications with the highest bandwidth requirements.

The GSRD can provide Transmit TCP performance up to 890Mbps using jumbo frames, and is implemented in the latest FPGA technology available today from Xilinx. The GSRD can be downloaded today from http://www.xilinx.com/gsrd/.

Click here for printable PDF
(By clicking on this link you agree to FPGA Journal’s Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

References:

1. “End-System Optimizations for High-Speed TCP” (www.cs.duke.edu/ari/publications/end-system.pdf)

2. “Use sendfile to optimize data transfer” (http://builder.com.com/5100-6372-1044112.html)

Leave a Reply

featured blogs
May 5, 2021
New 5G infrastructure is powering smart city projects worldwide; explore the importance of IoT security for smart city solutions in public safety & logistics. The post How 5G Networks Will Accelerate Development of Smart Cities appeared first on From Silicon To Software...
May 4, 2021
What a difference a year can make! Oh, we're not referring to that virus that… The post Realize Live + U2U: Side by Side appeared first on Design with Calibre....
May 3, 2021
As a NASA flight enthusiast, the idea of unmanned aerial vehicle systems (also known as drones) sounds like a lot of fun. A good example of how fun drones can be is through drone racing'¦yes you read that right'¦ drone racing! However, apart from how fun they can be, drones...
May 2, 2021
https://youtu.be/1HEd6JCriCQ Made in Groveland CA (camera Carey Guo) Monday: Package Assembly Design Kits Tuesday: Rapid Adoption of the Arm Server-Class Processors Wednesday: Arm V9A Thursday:... [[ Click on the title to access the full blog on the Cadence Community site. ]...

featured video

The Verification World We Know is About to be Revolutionized

Sponsored by Cadence Design Systems

Designs and software are growing in complexity. With verification, you need the right tool at the right time. Cadence® Palladium® Z2 emulation and Protium™ X2 prototyping dynamic duo address challenges of advanced applications from mobile to consumer and hyperscale computing. With a seamlessly integrated flow, unified debug, common interfaces, and testbench content across the systems, the dynamic duo offers rapid design migration and testing from emulation to prototyping. See them in action.

Click here for more information

featured paper

E-book: An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click to download e-book

featured chalk talk

Complete Packaging for IIoT Devices

Sponsored by Mouser Electronics and Phoenix Contact

Industrial Internet of Things (IIoT) design brings a new level of demands to the engineering team, particularly in areas like thermal performance, reliability, and scalability. And, packaging has a key role to play. In this episode of Chalk Talk, Amelia Dalton chats with Joel Boone of Phoenix Contact about challenges and solutions in IIoT design packaging.

Click here for more information about Phoenix Contact ICS 50 Enclosure System