Adding Intelligence to Video Security Systems Using FPGAs

A new generation of intelligent video security systems is emerging, enabled by the availability of modern megapixel resolution cameras to increase image quality, high dynamic range algorithms to dramatically improve image visibility and advanced analytics to extract high priority elements in a multitude of available images. These systems can automatically generate improved threat and risk assessment, dramatically improving operator efficiency.

The image processing requirements in these systems can easily overwhelm traditional CPU- or DSP-based designs. FPGAs like the LatticeECP3, used as hardware accelerators or as stand-alone implementations, can break this bottleneck and deliver the image processing power and flexibility demanded by intelligent security video systems or other, similar pixel processing-intensive applications.

Intelligent Video Security System

A typical security and surveillance video chain is illustrated in Figure 1. Information flows from left to right, starting with the Sensor Capture, where a digital representation of the image is created. This representation is processed using advanced Image Signal Processing techniques to improve image quality. The image is compressed using a standard Codec and then transmitted over a network interface, usually via a cable infrastructure. The video transmission is received by a storage device that further compresses the image and stores it in a Digital Video Recording unit. Video Content Analysis can be performed on the image before or after storage to create event- and activity-based alarms.

Figure 1: Typical Security and Surveillance Video Chain

Several important market trends are driving specific feature requirements in each of these key stages in the intelligent security video chain:

Sensor Capture: The need to identify specific elements (license plates, faces, currency value) in an image can drive up camera resolution from VGA to megapixel levels, since in many cases VGA cameras can’t cover the entire area at the resolution demanded by the application. High dynamic Range sensors also become necessary in order to see equally well into both light and dark areas in the image, and these sensor outputs require additional processing at the sensor stage itself.

Image Signal Processing: The need to see equally well in both light and dark areas can require additional signal processing of the initial image. For example, implementing High Dynamic Range algorithms on the initial image can bring out details in the dark area of the image, while preserving detail in bright areas that are normally washed out in an overexposed image. This can be very important at night, when facial details could be lost due to overhead light glare or license plate details lost due to headlight glare.

Compression: As images become larger, it is important to use compression techniques that conserve transmission bandwidth and storage space while preserving image quality. Excellent image capture and process steps can be degraded by inefficient compression. Supporting modern standards like H.264 while allowing easy migration to future standards will be important in extending product lifetime, an important consideration in security installations.

Transmission: Most security and surveillance installations must contend with existing analog cable infrastructure while also allowing migration to more efficient IP networks. A system must be flexible enough to support both these requirements in order to address the broadest market.

Storage: The storage element of the video chain must support the archiving of video data for viewing and analytics. Storage must be efficient while supporting the trend toward larger images sizes, as well as an increase in the number of channels (from 4 to 8 to 16 or even 32) per DVR. Options for off-site storage can put additional stress on system bandwidth and also require encryption or other secure transmission protocols.

Video Analytics: With the explosive increase in the number of cameras and the subsequent challenge of effectively monitoring all of them, automation can play an increased role. Video Analytics can augment human operation by analyzing the image in real-time to manage display priority or event/activity-based alarms.

The three most important market enablers today are the emergence of megapixel sensors, the use of High Dynamic Range and Video Analytics.

Video Security Image Processing Overview

A typical video processing block diagram is shown in Figure 2, below. The raw digital image is sourced from the CMOS image sensor in the upper left of the diagram. It is output on the sensor port and then linearized/decompanded to extend bit range from, say, 12 to as many as 20 bits. A defect correction algorithm is typically used to adjust for camera pixel defects by analyzing adjacent pixel values. Color information is added to the intensity levels via a process called ‘de Bayering.’ This process uses the sensor’s Bayer filter pattern of red, green and blue pixel sensors to recover color information. An additional process uses a Color Correction Matrix (CCM) to correct for cross-talk between red, green and blue pixels. Statistics are gathered for use in Auto Exposure, which adjusts the camera for different lighting conditions. High Dynamic Range (HDR) algorithms enhance light and dark portions of the image to render ‘washed-out’ areas more visible. Automatic White Balance (AWB) makes image color adjustments using a known white element in the image. Gamma correction adjusts the sensor image by applying a pre-distortion to the sensor signal in order to compensate for the response curve of a display device. The final overlay step allows text, pointers or alert icons to be superimposed on the captured image for improved monitoring.

Figure 2: Typical Video Processing Block Diagram

Megapixel Sensors

Megapixel sensors provide vastly increased amounts of raw image data for the video system. Without sufficient image resolution or area coverage, the rest of the video system is significantly limited. For example, in a typical traffic security installation a camera needs about 30 pixels/inch for license plate recognition, while in cash register transactions 150 pixels /inch are required to determine currency value. Since megapixel sensors have four times or more resolution than VGA cameras, they can also provide the required resolution over a much larger area. The ability of the sensor port to interface with a variety of multi-megapixel sensors (e.g. from 1.2 to 12MP) can be important in supporting increasing sensor resolutions and in extending the life of the design.

High Dynamic Range Images

High Dynamic Range is a measure of the system’s capability to discern and display objects in dimly lit as well as brightly lit areas, and typically can be expressed in dB. For a given image size, the greater the dynamic range supported by the system, the better the resulting HDR rendition will be, and the more high quality the resulting system. Sensor data output is companded, which can dramatically impact the image dB. For example, a 720p HD sensor from Aptina (the A-1000, or commercially named MT9M033) outputs 12-bit data in a logarithmic sensor-response curve, but is actually able to encode information with 20 bits per pixel to provide a scene dynamic range of 120dB (since 1bit = 6dB). In linear mode, the sensor can have a 12-bit intra scene dynamic range (and thus 72 dB). The A-1000 uses 3 integration times per readout; i.e. 3 internal multi-exposures. Each successive integration time is 16 times shorter than the previous, shifting the dynamic range 4 bits at a time (+24dB), leading to an intra-scene dynamic range of 72+24+24 = 120dB. The resulting image quality will also depend on the efficiency of the decompanding algorithm (transforming from 12 to 20 bits), and the linearization algorithm used.

Security and Surveillance cameras often need to adapt to rapidly changing light conditions to improve inter-scene dynamic range, or system dynamic range. For example, a room going from light to dark as an intruder turns off the light, or from dark to light as a warehouse door is opened, can benefit from wider dynamic range to insure that details are not lost from one image to the next. Typical systems will exhibit a white bloom on the display when a scene goes from dark to light, as the Auto Exposure adjusts. The shorter the duration of this bloom, the greater the quality of the resulting system will be. The ability of an Auto Exposure function to shift this intra-scene dynamic range based on changing light conditions can rapidly adjust the duration of the first exposure in the 3-exposure scheme discussed above, to enable inter-scene dynamic range of greater than 150 dB – a requirement routinely specified by automakers like BMW and Audi for automotive cameras.

Video Analytics

Video Analytics can identify possible threats by using image analysis to detect and categorize elements within a scene. For example, analytics could automatically search for what has changed from one image frame to another over a short time span. This may require the storage of an image and then its comparison to a new image. A weighted average of the difference on a pixel by pixel basis could measure the amount of change and the images with the most change flagged to the operator as a high priority target.

An example of how this function would work is illustrated in Figure 3. Two successive images from a camera (at the top) are used to compute the difference in the environment (bottom left in blue). An overlay target (in black) shows the relative weight which gives higher priority to movement in the critical portion of the scene. The weighted average difference is then computed. If the result exceeds a pre-programmed limit, an alert is issued to the operator. FPGA arithmetic resources are used in the computations, as summarized in the bottom right of Figure 3. Subtraction blocks are used to create the differences between images and a multiplier is used to create a weighted average metric.

Figure 3: Video Image Processing Needed to Determine Image Priority

FPGA Implementation: The Importance of Proven IP Cores and Reference Designs

In order to effectively implement the video processing portion of the video security system in an FPGA, it is important to have proven IP Cores for all of the critical functions. The availability of proven standard IP allows designers to focus their effort on the differentiated elements of the design. For example, Lattice Semiconductor provides an HDR IP building block that works with up to 12MP sensors, implements full 1080p at 60fps without the need for an external frame buffer, fast auto exposure adjustment that executes in less than 4 frames with no visible display bloom or blackout, and system dynamic range of 170 dB, all with system latency from 0.1 to 0.5 ms (depending on the IP blocks used). Standard IP cores for Codec functions (like H.264), video transmission functions (DVI, SDI, CVBS) and Camera link are also available. The availability of video analytics functions for Intelligent Video Motion Detection, Intrusion Detection, Object and People Counting, Camera Tampering and Sabotage Detection also speed the development of the baseline value-added analytics portion of the design.

Reference designs can also help dramatically improve time to market for an intelligent video security system. For example, Lattice and Maxim provide a DVR/NVR reference design. The design includes multiple channels (4 channel s D1 or 16 channels CIF), byte to frame interleaving data adaptation, 16xCIF preview video composition with audio support to complete the Security & Surveillance video chain.