Logo image
Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion
Journal article   Peer reviewed

Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion

Yujin Zhang, Haoyi Gao, Ferdous Sohel, Fei Wu, A. A. M. Muzahid, Jingwen Zhao, Zhendong Du and Lijun Zhang
IEEE transactions on instrumentation and measurement, Vol.74, 5046614
2025

Abstract

contextual fusion Convolutional neural networks efficient focusing amplifier Feature extraction Focusing Generative adversarial networks Image fusion light classification Lighting multi-modal stream Object detection salient object detection Semantics Streaming media visible-infrared image fusion Visualization
Cross-modal fusion of visible-infrared images can make targets more prominent, the interaction between multi-modal stream fusion and salient object detection tasks can more accurately depict the target. We propose a multi-modal stream focusing on a salient object detection network based on visible-infrared complementary fusion, namely MFCF. MFCF has two main subnetworks: an Attentional Complementary Image Fusion subnetwork for Light Perception (AComFusion) and a Multimodal Stream Focusing Contextual Salient Object Detection (MSFCSod).To address the issue where redundant information across modalities weakens the fusion, AComFusion is designed with an attention mutual information complementary module to remove redundancy and enhance complementary advantages. Additionally, a light classification module performs adaptive classification of lighting conditions, adjusting the contribution weights of modalities to obtain optimal quality under various lighting conditions. The output of AComFusion is used as a third modality stream and input into MSFCSod along with the visible and infrared sources. This fusion stream drives and guides the detection of infrared and visible streams to externally focus on significant target features. An efficient focusing amplifier module is designed to internally self-focus on the detected significant targets, enhancing their feature representations. Finally, the contextual fusion module integrates more low-level details and high-level semantic features to improve the texture edges of the objects, thus enhancing the MFCF network. Thorough experimental results on several benchmark datasets show that the proposed MFCF network achieved state-of-the-art performance. It also shows strong potential in the subtasks of image fusion and salient object detection.

Details

Metrics

8 Record Views
Logo image