Ferdous Sohel

Professor, Information Technology

Computer vision and multimedia computation

Artificial intelligence

Digital Agriculture

AI in Health and Medicine

AI in Environmental Monitoring

Journal article Open access Peer reviewed

QTopic: A novel quantum perspective on learning topics from text

by Monika Kabir, Mohammed Kaosar and Ferdous Sohel

Published 2026

Neurocomputing (Amsterdam), 669, 132483

Topic modeling is an unsupervised technique in natural language processing (NLP) used to identify hidden topic structures within large text datasets. Among traditional approaches to topic modeling, latent Dirichlet allocation, BERTopic, and Top2Vec, are widely adopted to uncover hidden topics in text data. However, these methods often struggle with poor performance in scenarios involving limited data availability or high-dimensional textual features. In this research, we propose QTopic, a novel hybrid quantum-classical topic modeling architecture that leverages quantum properties through parameterized quantum circuits. By integrating quantum-enhanced sampling into the inference pipeline, the proposed model captures richer topic distributions by mapping textual data into a higher-dimensional space. Benchmark experiments demonstrate that QTopic consistently outperforms classical approaches in terms of coherence, diversity, and topic distinctiveness, particularly when modeling a small number of topics. This study demonstrates the promise of quantum techniques in advancing unsupervised NLP, while also highlighting hardware limitations that present challenges for future research.

Journal article Open access Peer reviewed

Real-time fault detection in multirotor UAVs using lightweight deep learning and high-fidelity simulation data with single and double fault magnitudes

by Md Najmul Mowla, Davood Asadi and Ferdous Sohel

Published 2026

Complex & intelligent systems, 12, 2, 62

Robust fault detection and diagnosis (FDD) in multirotor unmanned aerial vehicles (UAVs) remains challenging due to limited actuator redundancy, nonlinear dynamics, and environmental disturbances. This work introduces two lightweight deep learning architectures: the Convolutional-LSTM Fault Detection Network (CLFDNet), which combines multi-scale one-dimensional convolutional neural networks (1D-CNN), long short-term memory (LSTM) units, and an adaptive attention mechanism for spatio-temporal fault feature extraction; and the Autoencoder LSTM Multi-loss Fusion Network (AELMFNet), a soft attention–enhanced LSTM autoencoder optimized via multi-loss fusion for fine-grained fault severity estimation. Both models are trained and evaluated on UAV-Fault Magnitude V1, a high-fidelity simulation dataset containing 114,230 labeled samples with motor degradation levels ranging from 5% to 40% in the take-off, hover, navigation, and descent phases, representing the most probable and recoverable fault scenarios in quadrotor UAVs. Including coupled faults enables models to learn correlated degradation patterns and actuator interactions while maintaining controllability under standard flight laws. CLFDNet achieves 96.81% precision in fault severity classification and 100% accuracy in motor fault localization with only 19.6K parameters, demonstrating suitability for real-time onboard applications. AELMFNet achieves the lowest reconstruction loss of 0.001 with Huber loss and an inference latency of 6 ms/step, underscoring its efficiency for embedded deployment. Comparative experiments against 15 baselines, including five classical machine learning models, five state-of-the-art fault detection methods, and five attention-based deep learning variants, validate the effectiveness of the proposed architectures. These findings confirm that lightweight deep models enable accurate and efficient diagnosis of UAV faults with minimal sensing.

Journal article Open access Peer reviewed

Automatic pixel-level annotation for plant disease severity estimation

by Masoud Rezaei, Dean Diepeveen, Hamid Laga and Ferdous Sohel

Published 2026

Computers and electronics in agriculture, 241, 111316

Plant disease adversely impacts food production and quality. Alongside detecting the disease, estimating its severity is important in managing the disease. Artificial intelligence deep learning-based techniques for plant disease detection are emerging. Unlike most of these techniques, which focus on disease recognition, this study addresses various plant disease-related tasks, including annotation, severity classification, lesion detection, and leaf segmentation. We propose a novel approach that learns the disease symptoms, which are then used to segment disease lesions for severity estimation. To demonstrate the work, a dataset of barley images was used. We captured the images of barley plants inoculated with diseases on test-bed paddocks at various growth stages. The dataset was automatically annotated at a pixel level using a trained vision transformer to obtain the ground truth labels. The annotated dataset was applied to train salient object detection (SOD) methods. Two top-performing lightweight SOD models were used to segment the disease lesion areas. To evaluate the performance of the SODs, we have tested them on our dataset and several other datasets, including the Coffee dataset, which has expert pixel-level labels that were unseen during the training step. Several morphological and spectral disease symptoms, including those akin to the widely used ABCD rule for human skin-cancer detection, i.e., asymmetry (A), border irregularity (B), colour variance (C), and diameter (D), are learned. To the best of our knowledge, this is the first study to incorporate these ABCD features in plant disease detection. We further extract visual and texture features using the grey level co-occurrence matrix (GLCM) and fuse them with the ABCD features. For the coffee dataset, our method achieved 82+% detection accuracy on the severity classification task. The results demonstrate the performance of the proposed method in detecting plant diseases and estimating their severity.

Journal article Open access Peer reviewed

A semi-supervised approach for classifying insect developmental phases from repurposed IP102

by Fatin Faiaz Ahsan, Melissa L. Thomas, Hamid Laga and Ferdous Sohel

Published 2026

Computers and electronics in agriculture, 242, 111337

Identifying insect pests, whether as adults, larvae, or eggs, is critical in pest management. Computational learning algorithms have demonstrated strong potential in achieving high identification performance, but these methods typically require large, balanced, and well-annotated datasets. This creates a challenge for insect pest identification, as rare species, despite often being the most damaging to crops, are underrepresented in available datasets. Moreover, annotating large-scale datasets is both costly and labour-intensive. To address this issue, we develop a semi-supervised learning approach, Cost-Focal FixMatch, which extends the widely used FixMatch framework by integrating class-aware reweighting and focal loss to better handle class imbalance. Specifically, we introduce a simple yet robust method for applying class weighting in cross-entropy and focal loss functions. The proposed method generates higher-quality pseudo labels compared to the baseline, ensuring better learning. We evaluate our approach using a repurposed IP102 dataset, which comprises four primary insect life stages, and a mixed IP102 dataset, where the class labels jointly represent insect species and their corresponding life stages. Our method considerably improves the classification of minority classes, achieving a notable increase in recall for the Larva class from 64% under the baseline FixMatch to 82% using MobileNetV3Small backbone. On the Mixed IP102 dataset, our approach achieves almost 9% better improved average recall than the baseline FixMatch built upon the EfficientNetV2S network.

Journal article Open access Peer reviewed

3D-CDNeT: Cross-domain learning with enhanced speed and robustness for point cloud recognition

by Abu Bakor Hayat Arnob, A.A.M. Muzahid, Hua Han, Yujin Zhang and Ferdous Sohel

Published 2026

Neurocomputing (Amsterdam), 662, 131939

Despite progress in 3D object recognition using deep learning (DL), challenges such as domain shift, occlusion, and viewpoint variations hinder robust performance. Additionally, the high computational cost and lack of labeled data limit real-time deployment in applications such as autonomous driving and robotic manipulation. To address these challenges, we propose 3D-CDNeT, a novel cross-domain deep learning network designed for unsupervised learning, enabling efficient and robust point cloud recognition. At the core of our model is a lightweight graph-infused attention encoder (GIAE) that enables effective feature interaction between the source and target domains. It not only improves recognition accuracy but also reduces inference time, which is essential for real-time applications. To enhance robustness and adaptability, we introduce a feature invariance learning module (FILM) using contrastive loss for learning invariant features. In addition, we adopt a Generative Decoder (GD) based on a Variational Auto-Encoder (VAE) to model diverse latent spaces and reconstruct meaningful 3D structures from the point cloud. This reconstruction process acts as a self-supervised generative objective that complements the discriminative recognition task, guiding the encoder to learn structure-preserving and domain-invariant features that improve recognition under occlusion and cross-domain conditions. Our proposed model unifies generative and discriminative tasks by using self-attention on the object covariance matrix to facilitate efficient information exchange, enabling the extraction of both local and global features. We further develop a self-supervised pretraining strategy that learns both global and local object invariances through GIAE and GD, respectively. A new loss function, combining contrastive loss and Chamfer distance, is proposed to strengthen cross-domain feature alignment. Experimental results on three benchmark datasets demonstrate that 3D-CDNeT outperforms existing state-of-the-art (SOTA) methods in recognition accuracy and inference speed, offering a practical solution for real-time 3D perception tasks. It achieves accuracies of 90.6 % on ModelNet40, 95.2 % on ModelNet10, and 76.4 % on the ScanObjectNN dataset in linear evaluation tasks, all while reducing runtime by 45 % without compromising performance. Detailed qualitative comparisons and ablation studies are provided to validate the effectiveness of each component and demonstrate the superior performance of our proposed method.

Journal article Peer reviewed

Towards building robust models for unimodal and multimodal medical imaging data

by Joy Dhar, Puneet Goyal, Maryam Haghighat, Nayyar Zaidi, Ferdous Sohel, Quoc Bao Vo and K C Santosh

Published 2026

Information fusion, 127, 103822

Deep neural network (DNN) models applied to medical image analysis are highly vulnerable to adversarial attacks, at both the example (input) and feature (model) levels. Ensuring DNN robustness against these adversarial attacks is crucial for accurate diagnostics. However, existing example-level and feature-level defense strategies, including adversarial training and image-level preprocessing, struggle to achieve effective adversarial robustness in medical image analysis. This challenge arises primarily from difficulties in capturing complex texture features in medical images and the inherent risk of changing intrinsic structural information in the input data. To overcome this challenge, we propose a novel medical imaging protector framework named MI-Protector. This framework comprises two defense methods for unimodal learning and one for multimodal fusion learning, addressing both example-level and feature-level vulnerabilities to robustly protect DNNs against adversarial attacks. For unimodal learning, we introduce an example-level defense mechanism using a generative model with a purifier, termed DGMP. The purifier comprises of a trainable neural network and a pre-trained generator from the generative model, which automatically removes a wide variety of adversarial perturbations. For example and feature-level defense mechanism, we propose unimodal attention noise injection mechanism – (UMAN), to protect learning models at the example and feature layers. To protect the multimodal fusion learning network, we propose the multimodal information fusion attention noise (MMIFAN) injection method, which offers protection at the feature layers while the non-learnable UMAN is applied at the example layer. Extensive experiments conducted on 16 datasets across various medical imaging modalities demonstrate that our framework provides superior robustness compared to existing methods against adversarial attacks. Code: https://github.com/misti1203/MI-Protector.

Journal article Open access Peer reviewed

A computational learning pipeline for glaucoma progression detection based on the prediction of visual field changes from fundus photographs

by Md.Reduanul Haque, Andrew Mehnert, William Huxley Morgan, Graham Mann and Ferdous Sohel

Published 2026

Expert systems with applications, 298, Part C, 129907

Detection of glaucoma progression is crucial to managing patients, permitting individualized care plans and treatment. It is a challenging task requiring the assessment of structural changes to the optic nerve head and functional changes based on visual field testing. Artificial intelligence, especially deep learning techniques, has shown promising results in many applications, including glaucoma diagnosis. This paper proposes a two-stage computational learning pipeline for detecting glaucoma progression using only fundus photographs. In the first stage, a deep learning model takes a time series of fundus photographs as input and outputs a vector of predictions where each element represents the overall rate of change in visual field (VF) sensitivity values for a sector (region) of the optic nerve head (ONH). We implemented two deep learning models—ResNet50 and InceptionResNetV2—for this stage. In the second stage, a binary classifier (weighted logistic regression) takes the predicted vector as input to detect progression. We also propose a novel method for constructing annotated datasets from temporal sequences of clinical fundus photographs and corresponding VF data suitable for machine learning. Each dataset element comprises a temporal sequence of photographs together with a vector-valued label. The label is derived by computing the pointwise linear regression of VF sensitivity values at each VF test location, mapping these locations to eight ONH sectors, and assigning the overall rate of change in each sector to one of the elements of the vector. We used a retrospective clinical dataset with 82 patients collected at multiple timepoints over five years in our experiments. The InceptionResNetV2-based implementation yielded the best performance, achieving detection accuracies of 97.28 ± 1.10 % for unseen test data (i.e., each dataset element is unseen but originates from the same set of patients appearing in the training dataset), and 87.50 ± 0.70 % for test data from unseen patients (training and testing patients are entirely different). The testing throughput was 11.60 ms per patient. These results demonstrate the efficacy of the proposed method for detecting glaucoma progression from fundus photographs.

Journal article Peer reviewed

Joint Adversarial Attack: An Effective Approach to Evaluate Robustness of 3D Object Tracking

by Riran Cheng, Xupeng Wang, Ferdous Sohel and Hang Lei

Published 2026

Pattern recognition, 172, Part A, 112359

Deep neural networks (DNNs) have widely been used in 3D object tracking, thanks to its superior capabilities to learn from geometric training samples and locate tracking targets. Although the DNN based trackers show vulnerability to adversarial examples, their robustness in real-world scenarios with potentially complex data defects has rarely been studied. To this end, a joint adversarial attack method against 3D object tracking is proposed, which simulates defects of the point cloud data in the form of point filtration and perturbation simultaneously. Specifically, a voxel-based point filtration module is designed to filter points of the tracking template, which is described by the voxel-wise binary distribution regarding the density of the point cloud. Furthermore, a voxel-based point perturbation module adds voxel-wise perturbations to the filtered template, whose direction is constrained by local geometrical information of the template. Experiments conducted on popular 3D trackers demonstrate that the proposed joint attack have decreased the success and precision of existing 3D trackers by 30.2% and 35.4% respectively in average, which made an improvement of 30.5% over existing attack methods.

Conference proceeding

Intelligent Intrusion Detection in IoT: Integrating Machine Learning and Feature Automation

by Nazratun Naim Neha, A. A. M. Muzahid, Md Shamsuzzoha, Imrul Jahid, Hua Han, Yujin Zhang, Md Musabbir Hossain and Ferdous Sohel

Date presented 20/03/2025

Proceedings (International Conference on Computer and Automation Engineering. Online), 374 - 379

17th International Conference on Computer and Automation Engineering (ICCAE) 2025, 20/03/2025–22/03/2025, Perth, Australia.

The rapid growth of the internet of things (IoT) raises serious security concerns, demanding effective protection from cyber-attacks. Intrusion detection depends on identifying key features, but despite advances in automation, manual feature selection remains necessary, limiting scalability. To address this limitation, we introduced a hybrid feature selection method that combines filter and wrapper techniques to automatically select important features and enhance the efficiency of machine learning (ML) models for intrusion detection tasks. We utilized the mutual information (MI) algorithm as the filter method and recursive feature elimination (RFE) as the wrapper method. We evaluated the performance of the proposed model on publicly available datasets, HIKARI2021 and UNSW-NB15. We compared the results with several existing methods, and our approach outperformed the state-of-the-art (SOTA) methods in terms of accuracy and training time. We presented comprehensive results, including both quantitative and qualitative analysis, to demonstrate the effectiveness and efficiency of our proposed methods.

Conference proceeding

LiDAR-SPD: Improving Adversarial Robustness of 3D Object Detection via Spherical Projection and Diffusion

by Mumuxin Cai, Xupeng Wang, Ferdous Sohel and Hang Lei

Published 2025

Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998), 4317

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), 06/04/2025–11/04/2025, Hyderabad, India

The advancements in light detection and ranging (LiDAR) sensors and 3D object detection techniques have boosted their deployment in a wide range of applications, autonomous driving, in particular. However, it has been demonstrated that 3D object detection models based on deep neural networks exhibit vulnerabilities and tend to be susceptible to adversarial attacks. Nonetheless, there exists a scarcity of defensive strategies explicitly tailored for mitigating adversarial attacks on 3D object detection. In this paper, we introduce LiDAR-SPD, a novel approach to defend against adversarial attacks targeting LiDAR-based 3D object detectors. Specifically, a spherical purification unit is designed, which encompasses two pivotal processes: spherical projection and spherical diffusion. The former leverages a spatial projection strategy to eliminate adversarial point clouds inserted in occluded regions, while the latter employs a diffusion model to regenerate points, rendering it closer to a pristine LiDAR scene. Comprehensive experiments conducted on the KITTI dataset demonstrate that our proposed LiDAR-SPD method effectively thwarts various types of adversarial attacks, decreasing the attack success rates against 3D object detectors by 60%.

Ferdous Sohel

Professor, Information Technology

Output list