Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion

Yujin Zhang; Haoyi Gao; Ferdous Sohel; Fei Wu; A. A. M. Muzahid; Jingwen Zhao; Zhendong Du; Lijun Zhang

doi:10.1109/TIM.2025.3571081

Back

Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion

Journal article

Peer reviewed

Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion

Yujin Zhang, Haoyi Gao, Ferdous Sohel, Fei Wu, A. A. M. Muzahid, Jingwen Zhao, Zhendong Du and Lijun Zhang

IEEE transactions on instrumentation and measurement, Vol.74, 5046614

2025

DOI: https://doi.org/10.1109/TIM.2025.3571081

Abstract

contextual fusion

Convolutional neural networks

efficient focusing amplifier

Feature extraction

Focusing

Generative adversarial networks

Image fusion

light classification

Lighting

multi-modal stream

Object detection

salient object detection

Semantics

Streaming media

visible-infrared image fusion

Visualization

Cross-modal fusion of visible-infrared images can make targets more prominent, the interaction between multi-modal stream fusion and salient object detection tasks can more accurately depict the target. We propose a multi-modal stream focusing on a salient object detection network based on visible-infrared complementary fusion, namely MFCF. MFCF has two main subnetworks: an Attentional Complementary Image Fusion subnetwork for Light Perception (AComFusion) and a Multimodal Stream Focusing Contextual Salient Object Detection (MSFCSod).To address the issue where redundant information across modalities weakens the fusion, AComFusion is designed with an attention mutual information complementary module to remove redundancy and enhance complementary advantages. Additionally, a light classification module performs adaptive classification of lighting conditions, adjusting the contribution weights of modalities to obtain optimal quality under various lighting conditions. The output of AComFusion is used as a third modality stream and input into MSFCSod along with the visible and infrared sources. This fusion stream drives and guides the detection of infrared and visible streams to externally focus on significant target features. An efficient focusing amplifier module is designed to internally self-focus on the detected significant targets, enhancing their feature representations. Finally, the contextual fusion module integrates more low-level details and high-level semantic features to improve the texture edges of the objects, thus enhancing the MFCF network. Thorough experimental results on several benchmark datasets show that the proposed MFCF network achieved state-of-the-art performance. It also shows strong potential in the subtasks of image fusion and salient object detection.

Details

Title: Multi-Modal Stream Focusing Salient Object Detection based on Visible-Infrared Complementary Fusion
Authors/Creators: Yujin Zhang - Shanghai University of Engineering Science
Haoyi Gao - School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
Ferdous Sohel - Murdoch University, Centre for Crop and Food Innovation
Fei Wu - Shanghai University of Engineering Science
A. A. M. Muzahid - Shanghai University of Engineering Science
Jingwen Zhao - Shanghai University of Engineering Science
Zhendong Du - School of Intelligent Manufacturing, Shanghai Technology and Innovation Vocational College, Shanghai, China
Lijun Zhang - Shanghai University of Engineering Science
Publication Details: IEEE transactions on instrumentation and measurement, Vol.74, 5046614
Publisher: IEEE
Number of pages: 1
Grant note: 2021ZYB01003 / Innovation Fund for Industry-University-Research of Chinese Universities 17ZR1411900 / Shanghai Natural Science Foundation (10.13039/100007219) 62072057 / National Natural Science Foundation of China (10.13039/501100001809)
Identifiers: 991005779515007891
Murdoch Affiliation: School of Information Technology
Language: English
Resource Type: Journal article

Metrics

8 Record Views