feature article
Subscribe Now

Image Processing Applications On New Generation FPGAs

The new generation of FPGAs with DSP resource and embedded processors are attracting the interest of the image processing market. With enhanced capabilities most of the DSP processing work can be off loaded from the software program stack to embedded processors and DSP resources on the FPGA to improve performance and reduce the cost of the whole system.

The traditional way of implementing algorithms in software limits the performance because the data is processed serially. Frequency of operation can be increased up to a certain extent to increase the performance or the required data rate to process the image data, but increasing the frequency above certain limits causes system level and board level issues that become a bottle neck in the design.

With the current image processing applications moving towards consumer markets, the amount of data to be processed has increased at a fast pace. The new compression algorithms on the market are keeping up with the increasing data requirement.

The DSP processors are also trying to keep up with these requirements. With the ever increasing need for processing, parallel processing on any hardware can help reduce the processing overhead and offer better system performance. A FPGA’s flexible architecture enables parallel processing providing a proper balance between the performance and the cost of the system, and in addition the flexibility to reprogram gives a quick turn around time.

With the right combination of IP’s, fast time-to- market can be achieved with real time prototyping. At the same time FPGAs also provide flexibility to upgrade to new standards. Figure 1 shows a group of Image IP’s that can be easily reshuffled to quickly create applications like video cell phones, set-top boxes, LCD projectors, Keyboard and Mouse Over IP (KVMIP), Digital cameras/camcorders etc. Along with parallel processing and high data rates, these groups of IP’s also provides high configurability that help to fine-tune the system to achieve certain performance rates.

The image processing block can be divided into two sub sections namely pixel processing blocks and frame processing blocks. The Pixel processing blocks works directly on the incoming pixel data whereas the frame processing blocks works on image stored in terms of frames. Color space converters, Gamma corrections and brightness control are some of the examples of pixel processing blocks. Static Huffman, AES, DCT, Interlace De-interlace come under the category of frame processing blocks.

20060307_einfo_fig1.gif

Figure 1. Image IP Blocks

Implementation of pixel processing blocks, like color space conversion, can be a simple job if FPGA resources are freely available and the pixel data frequency is low. The implementation on a FPGA with at least nine multipliers can be fairly simple. The matrix coefficients and offsets are stored in the ROM or loaded dynamically through external host configuration interface. Conversions are performed using a generic 3×3 matrix multiplication. The same can be achieved with just three multipliers, if the incoming pixel frequency is low, by running the internal core frequency three times the pixel frequency.

In terms of implementing a simple Color Space Conversion IP it really helps if a couple of hooks are kept in the design to make it flexible, for example, to add or remove pipe line registers, to configure the number of multiplier to be used etc. When you move on the Xilinx Virtex4 the implementation of multiplier and accumulator blocks simplifies the process even more. All the above considerations reduce the time it take to customize the IP for a particular FPGA while integrating it with other cores for different applications, saving the engineering time necessary to modify the core and avoiding human errors in doing so.

Image resize and Image rotation blocks fall under the category of pixel processing blocks with an overhead of line storages and low frequency of operations or as a frame processing blocks. A real time image resize can take up to 8 blocks of RAM with adder and multiplier trees with limited upscaling and downscaling capability. This can be a good solution for a system that requires real time image processing without worrying about the frame storage on an external memory like DDR. But if you are targeting applications that are constrained by the FPGA area and high pixel frequency, the real time image resize might not be a feasible solution. Going for an external DDR/SDRAM storage offers a better solution.

Proper partitioning of the logic to be implemented in hardware and software is one of the factors which decide the overall system efficiency. To implement Image resizing in hardware, you can have the hardware calculate the complete image size and come up with horizontal and vertical displacement that the hardware needs to read and compress. This logic will use up a good amount of hardware and arithmetic blocks. Since this operation won’t be performed often (in terms of calculating the image size and displacement) it can be moved to software and just provide the image size, vertical displacement and horizontal displacement values in the control registers.

Video compression plays a vital role in image processing applications. Uncompressed, high- definition pictures can easily take 1920x1080x24x30=1.49Gbps. Reserving the compression blocks to work on the stored data, helps in tolerating the latency that the IP’s might infer while running at high system frequency.

However, when you have many cores, in terms of frame processing blocks, to fetch the data from memory like DDR/SDRAM/SRAM, it really becomes crucial to define the interfaces between all the IP’s when they talk to the memory controllers.

Any standard bus like PLB/OPB/AMBA can be an easy solution to this, but do the IP really need to use the overhead? A very simple protocol can be defined with the least overhead in the form of request and grant. Various mechanisms like multiplexing the address, data and burst length bus over the same line can easily help reduce the number of lines which will play an important role when implementing anything in an FPGA compared to making an ASIC where every route is created as per the requirement. These buses are surely the way to go because of the standardization, but remember, they were NOT developed for implementation in FPGA. Image IP solutions should not take more than a couple of weeks for integration. Since bus standards are not optimized for FPGAs, we developed our own bus. Using this bus we can optimize the design for frequency and area with the FPGA architecture in mind.

One of the key points to be noted in developing Image Processing IP for the FPGA is the reusability and efficiency with which the hardware is implemented. This allows an efficient system in terms of cost and performance and also helps to reduce the time-to-market, by quickly integrating the IP blocks without touching them again for any modification, avoiding repetition of verification cycle.

Leave a Reply

featured blogs
May 8, 2024
Learn how artificial intelligence of things (AIoT) applications at the edge rely on TSMC's N12e manufacturing processes and specialized semiconductor IP.The post How Synopsys IP and TSMC’s N12e Process are Driving AIoT appeared first on Chip Design....
May 2, 2024
I'm envisioning what one of these pieces would look like on the wall of my office. It would look awesome!...

featured video

Introducing Altera® Agilex 5 FPGAs and SoCs

Sponsored by Intel

Learn about the Altera Agilex 5 FPGA Family for tomorrow’s edge intelligent applications.

To learn more about Agilex 5 visit: Agilex™ 5 FPGA and SoC FPGA Product Overview

featured paper

Altera® FPGAs and SoCs with FPGA AI Suite and OpenVINO™ Toolkit Drive Embedded/Edge AI/Machine Learning Applications

Sponsored by Intel

Describes the emerging use cases of FPGA-based AI inference in edge and custom AI applications, and software and hardware solutions for edge FPGA AI.

Click here to read more

featured chalk talk

Electrical Connectors for Hermetically Sealed Applications
Sponsored by Mouser Electronics and Bel
Many hermetic chambers today require electrical pathways to provide internal equipment with power, data or signals, or to receive data and signals from equipment within the chamber. In this episode of Chalk Talk, Amelia Dalton and Brad Taras from Cinch Connectivity Solutions explore the role that seals and connectors play in the performance of hermetic chambers. They examine the methodologies to determine hermetic seal leaks, the benefits of epoxy hermetic seals, and how Cinch Connectivity’s epoxy-based seals and hermetic connectors can add value to your next design.
Aug 22, 2023
31,976 views