feature article
Subscribe Now

Image Processing Applications On New Generation FPGAs

The new generation of FPGAs with DSP resource and embedded processors are attracting the interest of the image processing market. With enhanced capabilities most of the DSP processing work can be off loaded from the software program stack to embedded processors and DSP resources on the FPGA to improve performance and reduce the cost of the whole system.

The traditional way of implementing algorithms in software limits the performance because the data is processed serially. Frequency of operation can be increased up to a certain extent to increase the performance or the required data rate to process the image data, but increasing the frequency above certain limits causes system level and board level issues that become a bottle neck in the design.

With the current image processing applications moving towards consumer markets, the amount of data to be processed has increased at a fast pace. The new compression algorithms on the market are keeping up with the increasing data requirement.

The DSP processors are also trying to keep up with these requirements. With the ever increasing need for processing, parallel processing on any hardware can help reduce the processing overhead and offer better system performance. A FPGA’s flexible architecture enables parallel processing providing a proper balance between the performance and the cost of the system, and in addition the flexibility to reprogram gives a quick turn around time.

With the right combination of IP’s, fast time-to- market can be achieved with real time prototyping. At the same time FPGAs also provide flexibility to upgrade to new standards. Figure 1 shows a group of Image IP’s that can be easily reshuffled to quickly create applications like video cell phones, set-top boxes, LCD projectors, Keyboard and Mouse Over IP (KVMIP), Digital cameras/camcorders etc. Along with parallel processing and high data rates, these groups of IP’s also provides high configurability that help to fine-tune the system to achieve certain performance rates.

The image processing block can be divided into two sub sections namely pixel processing blocks and frame processing blocks. The Pixel processing blocks works directly on the incoming pixel data whereas the frame processing blocks works on image stored in terms of frames. Color space converters, Gamma corrections and brightness control are some of the examples of pixel processing blocks. Static Huffman, AES, DCT, Interlace De-interlace come under the category of frame processing blocks.

20060307_einfo_fig1.gif

Figure 1. Image IP Blocks

Implementation of pixel processing blocks, like color space conversion, can be a simple job if FPGA resources are freely available and the pixel data frequency is low. The implementation on a FPGA with at least nine multipliers can be fairly simple. The matrix coefficients and offsets are stored in the ROM or loaded dynamically through external host configuration interface. Conversions are performed using a generic 3×3 matrix multiplication. The same can be achieved with just three multipliers, if the incoming pixel frequency is low, by running the internal core frequency three times the pixel frequency.

In terms of implementing a simple Color Space Conversion IP it really helps if a couple of hooks are kept in the design to make it flexible, for example, to add or remove pipe line registers, to configure the number of multiplier to be used etc. When you move on the Xilinx Virtex4 the implementation of multiplier and accumulator blocks simplifies the process even more. All the above considerations reduce the time it take to customize the IP for a particular FPGA while integrating it with other cores for different applications, saving the engineering time necessary to modify the core and avoiding human errors in doing so.

Image resize and Image rotation blocks fall under the category of pixel processing blocks with an overhead of line storages and low frequency of operations or as a frame processing blocks. A real time image resize can take up to 8 blocks of RAM with adder and multiplier trees with limited upscaling and downscaling capability. This can be a good solution for a system that requires real time image processing without worrying about the frame storage on an external memory like DDR. But if you are targeting applications that are constrained by the FPGA area and high pixel frequency, the real time image resize might not be a feasible solution. Going for an external DDR/SDRAM storage offers a better solution.

Proper partitioning of the logic to be implemented in hardware and software is one of the factors which decide the overall system efficiency. To implement Image resizing in hardware, you can have the hardware calculate the complete image size and come up with horizontal and vertical displacement that the hardware needs to read and compress. This logic will use up a good amount of hardware and arithmetic blocks. Since this operation won’t be performed often (in terms of calculating the image size and displacement) it can be moved to software and just provide the image size, vertical displacement and horizontal displacement values in the control registers.

Video compression plays a vital role in image processing applications. Uncompressed, high- definition pictures can easily take 1920x1080x24x30=1.49Gbps. Reserving the compression blocks to work on the stored data, helps in tolerating the latency that the IP’s might infer while running at high system frequency.

However, when you have many cores, in terms of frame processing blocks, to fetch the data from memory like DDR/SDRAM/SRAM, it really becomes crucial to define the interfaces between all the IP’s when they talk to the memory controllers.

Any standard bus like PLB/OPB/AMBA can be an easy solution to this, but do the IP really need to use the overhead? A very simple protocol can be defined with the least overhead in the form of request and grant. Various mechanisms like multiplexing the address, data and burst length bus over the same line can easily help reduce the number of lines which will play an important role when implementing anything in an FPGA compared to making an ASIC where every route is created as per the requirement. These buses are surely the way to go because of the standardization, but remember, they were NOT developed for implementation in FPGA. Image IP solutions should not take more than a couple of weeks for integration. Since bus standards are not optimized for FPGAs, we developed our own bus. Using this bus we can optimize the design for frequency and area with the FPGA architecture in mind.

One of the key points to be noted in developing Image Processing IP for the FPGA is the reusability and efficiency with which the hardware is implemented. This allows an efficient system in terms of cost and performance and also helps to reduce the time-to-market, by quickly integrating the IP blocks without touching them again for any modification, avoiding repetition of verification cycle.

Leave a Reply

featured blogs
Apr 25, 2024
Structures in Allegro X layout editors let you create reusable building blocks for your PCBs, saving you time and ensuring consistency. What are Structures? Structures are pre-defined groups of design objects, such as vias, connecting lines (clines), and shapes. You can combi...
Apr 25, 2024
See how the UCIe protocol creates multi-die chips by connecting chiplets from different vendors and nodes, and learn about the role of IP and specifications.The post Want to Mix and Match Dies in a Single Package? UCIe Can Get You There appeared first on Chip Design....
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Nexperia Energy Harvesting Solutions
Sponsored by Mouser Electronics and Nexperia
Energy harvesting is a great way to ensure a sustainable future of electronics by eliminating batteries and e-waste. In this episode of Chalk Talk, Amelia Dalton and Rodrigo Mesquita from Nexperia explore the process of designing in energy harvesting and why Nexperia’s inductor-less PMICs are an energy harvesting game changer for wearable technology, sensor-based applications, and more!
May 9, 2023
40,790 views