feature article
Subscribe Now

Optimized FIR Filter Implementation Using SSE Instructions

Intel® software products take advantage of the performance potential of the Intel® Atom™ processor. When compiler technologies or optimized off-the-shelf libraries are not sufficient to meet extreme performance requirements, hand-optimized routines are justified to maximize performance. This paper describes step-by-step development of an ultra-fast impulse response (FIR) filter using Intel® Streaming SIMD Extensions (Intel® SSE) and other Intel Atom processor features.

FIR filters are one of the primary types of filters used in Digital Signal Processing. This paper describes the optimization of a 16-bit fix point FIR filter of order 63. In several steps, the filter performance was improved by a factor of more than 5 and was brought close to the theoretical limit of the current architecture of Intel® Atom™ processors. This was enabled by loop unrolling, forceful use of Intel® SSE instructions, consideration of memory alignment and selection of the most efficient rather than the most obvious SSE instructions.

The methodologies described here can be applied to other FIR filters with little or no modification. The filter order and number of output values can be changed easily, though the number of output values must be a multiple of eight. The benefit of Intel SSE instructions will increase with the increasing number of output values and higher order filters. Floating point FIR filters can be optimized following the same recipes, with the Intel® SSE instruction set allowing for four-way parallelism.

Multirate filters are commonly implemented using FIR filters. Using interpolation and decimation, the output is resampled to a different data rate. For example, 640 input values may result in 480 output values. The optimization steps described in this paper also allow for optimal performance of multirate filters on Intel® architecture-based processors.

Leave a Reply

featured blogs
Jul 25, 2025
Manufacturers cover themselves by saying 'Contents may settle' in fine print on the package, to which I reply, 'Pull the other one'”it's got bells on it!'...

Libby's Lab

Libby's Lab Scopes out Texas Instruments AMC0311s Precision Isolated Amplifier

Sponsored by Mouser Electronics and Texas Instruments

Join Libby and Demo in this episode of “Libby’s Lab” as they explore the Texas Instruments AMC0311s Precision Isolated Amplifiers, available at Mouser.com! These amplifiers are great for protecting sensitive circuits in high-power applications. Keep your circuits charged and your ideas sparking!

Click here for more information about Texas Instruments AMC0x11S Precision Isolated Amplifier

featured paper

Agilex™ 3 vs. Certus-N2 Devices: Head-to-Head Benchmarking on 10 OpenCores Designs

Sponsored by Altera

Explore how Agilex™ 3 FPGAs deliver up to 2.4× higher performance and 30% lower power than comparable low-cost FPGAs in embedded applications. This white paper benchmarks real workloads, highlights key architectural advantages, and shows how Agilex 3 enables efficient AI, vision, and control systems with headroom to scale.

Click to read more

featured chalk talk

Smarter Isolation for Smarter Power Systems
Sponsored by Mouser Electronics and Bourns
Today’s smarter power systems demand smarter gate drive isolation. In this episode of Chalk Talk, Kritika Murari from Bourns and Amelia Dalton explore the hidden complexities of gate drive isolation, how reinforced gate drive isolation can improve a variety of design parameters, and how you can take advantage of Bourns push pull isolation transformers for your next design.
Jul 14, 2025
25,318 views