Intel® software products take advantage of the performance potential of the Intel® Atom™ processor. When compiler technologies or optimized off-the-shelf libraries are not sufficient to meet extreme performance requirements, hand-optimized routines are justified to maximize performance. This paper describes step-by-step development of an ultra-fast impulse response (FIR) filter using Intel® Streaming SIMD Extensions (Intel® SSE) and other Intel Atom processor features.
FIR filters are one of the primary types of filters used in Digital Signal Processing. This paper describes the optimization of a 16-bit fix point FIR filter of order 63. In several steps, the filter performance was improved by a factor of more than 5 and was brought close to the theoretical limit of the current architecture of Intel® Atom™ processors. This was enabled by loop unrolling, forceful use of Intel® SSE instructions, consideration of memory alignment and selection of the most efficient rather than the most obvious SSE instructions.
The methodologies described here can be applied to other FIR filters with little or no modification. The filter order and number of output values can be changed easily, though the number of output values must be a multiple of eight. The benefit of Intel SSE instructions will increase with the increasing number of output values and higher order filters. Floating point FIR filters can be optimized following the same recipes, with the Intel® SSE instruction set allowing for four-way parallelism.
Multirate filters are commonly implemented using FIR filters. Using interpolation and decimation, the output is resampled to a different data rate. For example, 640 input values may result in 480 output values. The optimization steps described in this paper also allow for optimal performance of multirate filters on Intel® architecture-based processors.