Multicore Plus Optimized Packet Processing Software

We used to talk about POTS, or the Plain Old Telephone Service, whose only service was basic dial tone, whether the original analog or newer digital variety. A lot has changed since then, and the fixed telephony infrastructure bears no resemblance to the POTS of yesteryear. The last few years have seen similar changes in the wireless infrastructure, and, with plenty more on the horizon, we now look back fondly on what some are now calling the POCS (Plain Old Cellular Service). Making a simple phone call, ‘sans wires,’ no matter where we are, is now an intrinsic part of everyday life. The myriad wireless technologies are no longer just a voice medium, and, since the advent of usable data capabilities, the demand and associated growth of the wireless data infrastructure has been unprecedented, with no end in sight.

Fueled by the introduction and accelerated adoption of Internet capable, wireless enabled devices such as smartphones, tablets, eBook readers etc., the demands for network capacity translate directly into a need for increased performance within the infrastructure and the elements that must carry and manage the traffic. In addition to the obvious capital expense, this is creating issues with real estate at both the rack and data center level, and, from an operating expense perspective (not discounting green issues), power consumption is another major area that must be kept under control.

Advantech, one of the world’s leading developers and manufacturers of embedded computing for Telecom and Networking markets, has recognized the need to solve these issues for their customers. They have worked with one of their ecosystem partners, 6WIND, to create a platform solution that truly optimizes the power/performance equation. Utilizing multicore processing blades with 6WIND’s packet-processing software solution, Advantech has reduced the overall power consumption of one of their Packetarium™ systems running a standard Linux network stack from 730W to just over 400W.

The Multicore Advantage

The first key step in optimizing overall system performance while remaining conscious of platform power consumption is in the choice of processing hardware. Traditional processors gained performance improvements through the continual up-rating of the clock frequency. The laws of physics became a major challenge as power consumption increased in proportion to the square of the clock frequency. For use in high performance networking systems, the ever-increasing power draw and associated heatsink and cooling requirements became impractical. Developers of high-end processors have since migrated to multicore architectures in order to meet the need for increased performance while maintaining manageable levels of power consumption. Multicore processors are able to generate performance gains through their ability to execute multiple tasks simultaneously and independently. The most significant gains are to be made in terms of both overall system throughput performance and power management, when an application can be logically divided into independently manageable instances. For networking applications, data plane and control plane processing represent elements that can (and should) be managed separately. High performance data plane processing requires that large numbers of concurrent instances be managed. A platform that implements multicore blades is therefore ideal for such applications. In order to optimize the overall performance-per-watt metric, one must maximize the use of each blade and core; to achieve this, a highly efficient packet-processing software solution is required.

Meeting The Power Performance Challenge

Advantech’s customers were looking for solutions that could help them manage the performance power challenge from two seemingly opposing perspectives. Firstly, how could one pack as much network-processing performance as possible into a single shelf, and secondly, how could they reduce power consumption for existing applications for installation in environments where cooling was both a challenge and costly. Interestingly, both challenges were met with the same solution.

One of Advantech’s ecosystem partners, 6WIND, has a software solution called 6WINDGate™. 6WIND helps to answer the complex question of how to architect and distribute the software in a multicore system to optimize the throughput for network applications that typically need to manage and process packets from multiple 10Gbps streams of network traffic simultaneously.

Whatever the operating system, whether a Linux implementation or a real-time OS, significant overheads are introduced into a standard networking stack; for example, through preemptions, etc. These processing overheads are imposed on each packet passing through the system, resulting in a major performance penalty for overall throughput. In a networking stack designed to optimize throughput, the implementation is split into two layers. The lower layer, typically called the fast path, processes the majority of incoming packets outside the OS environment without incurring any of the OS overheads that degrade overall performance. There are only a few packets that require complex processing; these are forwarded to the OS networking stack, which performs the necessary management, signaling and control functions. Splitting the networking stack in this way maintains standard OS application interfaces, minimizing or eliminating impact on the functionality of application software. An additional benefit is portability, as there is no need to rewrite or recertify existing applications; they simply run faster due to the acceleration of the underlying packet processing through the fast path environment.

Using such a software implementation on a multicore processor, the OS networking stack need only run on a limited number of cores, freeing other cores to run the fast path packet processing, thus maximizing the overall throughput of the system. The lack of scalability in a standard OS stack no longer impacts system performance, as the majority of packets are no longer subjected to unnecessary latencies. The fast path cores are dedicated to performing the functions that actually determine the net performance of the product. A more scalable architecture can now be built, as designers can select a processor or system platform with the number of cores most appropriate to their product and application requirements.

Performance multipliers – Optimizing Performance/Watt

Through the use of the 6WINDGate software utilizing optimized packet processing based on the fast path concept, networking performance within the processor subsystems has typically seen a seven- to ten-times improvement, dependent on the specific application.

Power is consumed by multiple elements within any complex system. In a typical telecom infrastructure product, 60% of the power is consumed by processor subsystems (including memory), while the remaining 40% comes from I/O, system management, and power supply subsystems. Looking at the math: based on a potential 7x performance improvement (therefore one-seventh of the original processing capacity is required to maintain performance), we can derive how power savings in excess of 50% can be achieved.

Processor subsystems = 60% of system power consumption

1/7^th of 60% = 8.57%

60% – 8.57% = 51.43% savings

With the power consumed by the processing load reduced dramatically, the system power supply can be reduced in capacity (and size); for simplicity we approximate power savings of ~4%. In total this gives a saving in overall system power consumption of ~55% while the application performance and throughput remains the same.

Enhancing the Packetarium Power Envelope

This is the scenario that Advantech applied with one of their Packetarium™ systems running a standard Linux distribution. Advantech’s NCP-7560 Packetarium™ platform has the capacity to run up to ninety-six processor cores networked across eight boards. Each board incorporates its own multicore processor along with 8GB of associated memory and routes dual XAUI lanes to its edge connector. The system’s carrier board provides the 10GbE switching fabric between the processor boards and six external 10GbE ports.

The typical power budget of the Packetarium™ system is 250W + 480W (8 x 60W for each processing board) = 730W, including modules for remote system management and power supplies. 6WINDGate was installed in place of the standard Linux OS network stack, and an application using 6 ports was tested. The results showed that only three network-processing boards were required to match the same performance level that previously required a full load of eight boards. This reduction represents 62.5% of the power consumed by the processing load. At the system level it has been reduced to approximately 55% of the original power budget [(250W + 180W – (730W x 4%)]. In summary, using the performance gains achieved using the 6WINDGate software, Advantech was able to reduce the overall power consumption of this network application from 730W to just over 400W. Looking from the perspective of a potential CAPEX reduction for a similarly performing platform, this scenario would represent 40% less than the previous hardware cost.

Looking at the alternate scenario: by re-harnessing the energy efficiency of this solution, we can increase the system performance at the platform level by reinstalling the original eight network-processing blades. Due to the now greater performance-per-Watt metric, the now greatly enhanced system remains within the original shelf power budget.

Following the testing of these scenarios, Advantech was extremely impressed with how the 6WINDGate software was able to help them achieve exactly what their Networking and Telecom customers were looking for. Now using well-designed software optimized specifically for the network environment, they are able to exploit the advantages of multicore processors to deliver significant improvements in energy efficiency, ideally suited to the challenges faced by next-generation networking equipment.