feature article
Subscribe Now

Image Processing Applications On New Generation FPGAs

The new generation of FPGAs with DSP resource and embedded processors are attracting the interest of the image processing market. With enhanced capabilities most of the DSP processing work can be off loaded from the software program stack to embedded processors and DSP resources on the FPGA to improve performance and reduce the cost of the whole system.

The traditional way of implementing algorithms in software limits the performance because the data is processed serially. Frequency of operation can be increased up to a certain extent to increase the performance or the required data rate to process the image data, but increasing the frequency above certain limits causes system level and board level issues that become a bottle neck in the design.

With the current image processing applications moving towards consumer markets, the amount of data to be processed has increased at a fast pace. The new compression algorithms on the market are keeping up with the increasing data requirement.

The DSP processors are also trying to keep up with these requirements. With the ever increasing need for processing, parallel processing on any hardware can help reduce the processing overhead and offer better system performance. A FPGA’s flexible architecture enables parallel processing providing a proper balance between the performance and the cost of the system, and in addition the flexibility to reprogram gives a quick turn around time.

With the right combination of IP’s, fast time-to- market can be achieved with real time prototyping. At the same time FPGAs also provide flexibility to upgrade to new standards. Figure 1 shows a group of Image IP’s that can be easily reshuffled to quickly create applications like video cell phones, set-top boxes, LCD projectors, Keyboard and Mouse Over IP (KVMIP), Digital cameras/camcorders etc. Along with parallel processing and high data rates, these groups of IP’s also provides high configurability that help to fine-tune the system to achieve certain performance rates.

The image processing block can be divided into two sub sections namely pixel processing blocks and frame processing blocks. The Pixel processing blocks works directly on the incoming pixel data whereas the frame processing blocks works on image stored in terms of frames. Color space converters, Gamma corrections and brightness control are some of the examples of pixel processing blocks. Static Huffman, AES, DCT, Interlace De-interlace come under the category of frame processing blocks.

20060307_einfo_fig1.gif

Figure 1. Image IP Blocks

Implementation of pixel processing blocks, like color space conversion, can be a simple job if FPGA resources are freely available and the pixel data frequency is low. The implementation on a FPGA with at least nine multipliers can be fairly simple. The matrix coefficients and offsets are stored in the ROM or loaded dynamically through external host configuration interface. Conversions are performed using a generic 3×3 matrix multiplication. The same can be achieved with just three multipliers, if the incoming pixel frequency is low, by running the internal core frequency three times the pixel frequency.

In terms of implementing a simple Color Space Conversion IP it really helps if a couple of hooks are kept in the design to make it flexible, for example, to add or remove pipe line registers, to configure the number of multiplier to be used etc. When you move on the Xilinx Virtex4 the implementation of multiplier and accumulator blocks simplifies the process even more. All the above considerations reduce the time it take to customize the IP for a particular FPGA while integrating it with other cores for different applications, saving the engineering time necessary to modify the core and avoiding human errors in doing so.

Image resize and Image rotation blocks fall under the category of pixel processing blocks with an overhead of line storages and low frequency of operations or as a frame processing blocks. A real time image resize can take up to 8 blocks of RAM with adder and multiplier trees with limited upscaling and downscaling capability. This can be a good solution for a system that requires real time image processing without worrying about the frame storage on an external memory like DDR. But if you are targeting applications that are constrained by the FPGA area and high pixel frequency, the real time image resize might not be a feasible solution. Going for an external DDR/SDRAM storage offers a better solution.

Proper partitioning of the logic to be implemented in hardware and software is one of the factors which decide the overall system efficiency. To implement Image resizing in hardware, you can have the hardware calculate the complete image size and come up with horizontal and vertical displacement that the hardware needs to read and compress. This logic will use up a good amount of hardware and arithmetic blocks. Since this operation won’t be performed often (in terms of calculating the image size and displacement) it can be moved to software and just provide the image size, vertical displacement and horizontal displacement values in the control registers.

Video compression plays a vital role in image processing applications. Uncompressed, high- definition pictures can easily take 1920x1080x24x30=1.49Gbps. Reserving the compression blocks to work on the stored data, helps in tolerating the latency that the IP’s might infer while running at high system frequency.

However, when you have many cores, in terms of frame processing blocks, to fetch the data from memory like DDR/SDRAM/SRAM, it really becomes crucial to define the interfaces between all the IP’s when they talk to the memory controllers.

Any standard bus like PLB/OPB/AMBA can be an easy solution to this, but do the IP really need to use the overhead? A very simple protocol can be defined with the least overhead in the form of request and grant. Various mechanisms like multiplexing the address, data and burst length bus over the same line can easily help reduce the number of lines which will play an important role when implementing anything in an FPGA compared to making an ASIC where every route is created as per the requirement. These buses are surely the way to go because of the standardization, but remember, they were NOT developed for implementation in FPGA. Image IP solutions should not take more than a couple of weeks for integration. Since bus standards are not optimized for FPGAs, we developed our own bus. Using this bus we can optimize the design for frequency and area with the FPGA architecture in mind.

One of the key points to be noted in developing Image Processing IP for the FPGA is the reusability and efficiency with which the hardware is implemented. This allows an efficient system in terms of cost and performance and also helps to reduce the time-to-market, by quickly integrating the IP blocks without touching them again for any modification, avoiding repetition of verification cycle.

Leave a Reply

featured blogs
Apr 24, 2024
Learn about maskless electron beam lithography and see how Multibeam's industry-first e-beam semiconductor lithography system leverages Synopsys software.The post Synopsys and Multibeam Accelerate Innovation with First Production-Ready E-Beam Lithography System appeared fir...
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

How MediaTek Optimizes SI Design with Cadence Optimality Explorer and Clarity 3D Solver

Sponsored by Cadence Design Systems

In the era of 5G/6G communication, signal integrity (SI) design considerations are important in high-speed interface design. MediaTek’s design process usually relies on human intuition, but with Cadence’s Optimality Intelligent System Explorer and Clarity 3D Solver, they’ve increased design productivity by 75X. The Optimality Explorer’s AI technology not only improves productivity, but also provides helpful insights and answers.

Learn how MediaTek uses Cadence tools in SI design

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Connectivity Solutions for Smart Trailers
Smart trailers can now be equipped with a wide variety of interconnection systems including wire-to-wire, wire-to-board, and high-speed data solutions. In this episode of Chalk Talk, Amelia Dalton and Blaine Dudley from TE Connectivity explore the evolution of smart trailer technology, the different applications within a trailer where connectivity would be valuable, and how TE Connectivity is encouraging innovation in the world of smart trailer technology.
Oct 6, 2023
25,749 views