I last wrote about Prophesee’s event-based vision sensor at the end of 2023. (See “Prophesee’s 5th Generation Sensors Detect Motion Instead of Images for Industrial, Robotic, and Consumer Applications.”) Back then, I wrote about the company’s fifth generation 320×320-pixel GenX320 sensor, which captures changes in light on a per-pixel basis rather than generating video frames like conventional imaging sensors. The technology has progressed since then. Prophesee has worked with Sony to produce the IMX636HD, a 1280×720-pixel, event-based sensor for industrial applications, and Imaging Development Systems GmbH (IDS) has incorporated that sensor into a ready-to-use industrial camera called the uEye, which is available in a zinc die-cast housing as the uEye XCP-E or in the form of a board-mountable camera in a plastic housing as the uEye XLS-E. These cameras and Prophesee’s development tools allow you to start experimenting with event-based vision processing immediately.
Prophesee’s event-based imaging sensors work very differently from the sensors you’re likely to be familiar with. Conventional imaging sensors, like the ones commonly used in mobile phones and security cameras, capture full-color images at 30 or 60 frames per second. Each captured frame generates megabytes of data. If your application requires still frames or video, those conventional sensors are great, but many industrial applications don’t need still images or video streams and require significant image processing to detect motion or vibration, for example. Prophesee’s photodiode sensor arrays don’t capture still images or generate video streams. They capture changes in light intensity and only generate an output when there’s a change in incident light to one or more pixels. So, the individual sensor pixels within the IMX636 sensor perform much of the image processing required for specific applications using analog techniques with a resulting maximum power consumption of only 205 mW. That’s one or more orders of magnitude less power than you’d need to extract similar information using CPUs, GPUs, or FPGAs.
The IDS uEye XCP-E industrial camera incorporates the Sony/Prophesee IMX636HD event-based sensor. Image credit: IDS
Before going further into the technology itself, I want to discuss the applications for even-based imaging because they’ll drive the adoption of this significantly different image technology. Each pixel in the Sony IMX636 event-based sensor generates data only when the incident light on the pixel changes by a set threshold. So, each pixel is either recording light changes on a static scene or is capturing the luminosity artifacts of motion in the scene. When the light intensity level on an iMX636 pixel changes by at least the set amount, the sensor generates a 4-value vector containing the triggered pixel’s X and Y coordinates, the direction of the intensity change (more or less light), and a time stamp. Unchanging pixels generate no data. Because the time between triggering pixel events in a single pixel can be as little as 100 µsec, this sensor can be used to detect motion that would otherwise require a high-speed video camera running at thousands of frames per second.
Industrial applications for such high-speed motion capture include high-speed object counting, vibrational analysis, and super-slow-motion imaging. An example of a counting application might include small parts such as pills, nuts, bolts, ball bearings, apples, or grapes falling through the camera’s field of view. Vibrational analysis can be used to detect imminent bearing failures or mechanical jams in rotating machinery. Slow-motion applications can include the analysis of droplet fuel injector spray patterns or medication spray nozzles to determine if these components require cleaning, repair, or replacement. Previously, such applications would require relatively expensive high-speed vision systems or line-scan cameras. The IDS uEye cameras represent a lower-cost approach to these sorts of applications.
The Sony IMX636 sensor and the IDS uEye cameras are sensitive to light changes as small as 0.08 lux, and the sensors are logarithmic, so they can deliver as much as 120 dB of dynamic range. That’s far beyond the capabilities of conventional imaging sensors. As a result, the IDS uEye cameras or the Sony IMX636 sensor can be used in situations where the illumination is very low, such as for night applications. Because of the high dynamic range, the cameras and sensor can also work in outdoor applications, both at night and during the day, without need for a mechanical aperture to control light levels.
This demo shows a frame capture from a video that illustrates the difference between the output of a conventional video imager (on the right) and an IDS uEye camera (on the left). The IDS uEye camera generates data only when a pixel’s intensity changes. In this case, the cat figurine’s moving arm is the only motion in the scene. On the left, you can see only the depiction of changes in light intensity on the arm as it moves, an image based on the accumulation of pixel events captured over a set time period to create a video frame. Image credit: IDS.
With such an unusual sensor, conventional image-processing tools aren’t applicable, so Prophesee created its own software development suite, the GUI-based Metavision Studio, which allows you to visualize and record data streamed by event-based vision systems based on Prophesee technology. The company has also developed an API called the Metavision SDK with C++ and Python code samples, Python classes, algorithms, tutorials, documentation, and seventeen sample applications including:
- Detection inference – a Deep Neural Network (DNN) with a pretrained automotive model written in pytorch with support for detection and tracking using a C++ pipeline.
- Object detection – a detection application with a framework for training and multiple pre-built, event-based tensor representations.
- Object tracking – tracks moving objects with low-rate data and sparse information.
- Event to video – a neural network that builds grayscale images based on events, suitable for use with frame-based image-processing software.
- Optical flow inference – a pretrained flow model that predicts optical flow from event-based data.
- Corner detection and tracking – detects and tracks corners in an event stream while generating stable keypoints and long tracks for even fast-moving scenes.
- Gesture tracking inference – plays a mean game of rock, paper, scissors.
- Particle size monitoring – counts and measures the size of objects moving at very high speed through a channel or along a conveyor.
- Vibration monitoring – monitors vibration frequencies continuously, with pixel precision, by tracking the temporal evolution of light in every pixel within a scene.
- Spatter monitoring – tracks small particles with spatter-like motion.
- High-speed counting – counts objects at high speeds and with high accuracy without motion blur.
- Edgelet tracking – tracks 3D edges or fiducial markers, typically used for AR/VR applications.
- Active marker tracking – real-time active marker tracking at ultra-high speeds.
- Stereo image matching – estimates depth maps by matching features in synchronized stereo event streams.
- Ultra-slow-motion video – produces a video stream at the frame-based equivalent of more than 200,000 frames per second.
- XYT visualization – directly visualizes X, Y, and time information from the event-based sensor’s pixels.
Prophesee licenses the latest version of the Metavision Studio, version 5.1.0, for a fee, which includes a license to distribute the applications developed with the tools. The company also offers an earlier version of the tool kit for evaluation purposes – version 4.6.2 – at no cost and without a distribution license. There’s an open-source set of tools on Github as well. You will absolutely need these tools unless you are familiar with event-based image processing, which I think unlikely.
For some excellent visualization videos of some event-based applications, click here to see the uEYE Web page on the IDS Web site.
Hi Steve — the folks at Prophesee are doing some very interesting stuff on the sensor front; for example, see my column “Look At Something, Ask a Question, Hear an Answer: Welcome to the Future”: https://www.eejournal.com/article/look-at-something-ask-a-question-hear-an-answer-welcome-to-the-future/
So, having detected an event, can this thing send you a complete frame of pixels (a snapshot of what’s being viewed) or works you need a second, ordinary, camera for that?
You’d need a second camera if you wanted a normal video frame. The IDS camera’s sensors only sense and trigger on changes in light intensity.