SIMD, or Single Instruction Multiple Data, is an established technique for boosting performance on processors that handle identical workloads. On Intel® processors, multiple independent data buffers can be processed simultaneously using SIMD instructions and registers. The multi-buffer approach can accelerate certain algorithms, such as AES-CBC-Encrypt and 3DES, where SIMD instructions do not exist, and where data dependencies prevent the optimal use of the processor’s execution resources.
This paper describes how a scheduler can be used for efficient processing of multi-buffer routines in applications where the workloads are not identical and where buffer sizes vary. The challenge is how to implement a scheduler that has minimal performance overhead and how to achieve performance gains despite the presence of multiple small size buffers.
As this paper explains, processing multiple independent data buffers in parallel can improve performance between 2X and 3X over the best-known single buffer method, even without SIMD.