Output list
Journal article
Automatic pixel-level annotation for plant disease severity estimation
Published 2026
Computers and electronics in agriculture, 241, 111316
Plant disease adversely impacts food production and quality. Alongside detecting the disease, estimating its severity is important in managing the disease. Artificial intelligence deep learning-based techniques for plant disease detection are emerging. Unlike most of these techniques, which focus on disease recognition, this study addresses various plant disease-related tasks, including annotation, severity classification, lesion detection, and leaf segmentation. We propose a novel approach that learns the disease symptoms, which are then used to segment disease lesions for severity estimation. To demonstrate the work, a dataset of barley images was used. We captured the images of barley plants inoculated with diseases on test-bed paddocks at various growth stages. The dataset was automatically annotated at a pixel level using a trained vision transformer to obtain the ground truth labels. The annotated dataset was applied to train salient object detection (SOD) methods. Two top-performing lightweight SOD models were used to segment the disease lesion areas. To evaluate the performance of the SODs, we have tested them on our dataset and several other datasets, including the Coffee dataset, which has expert pixel-level labels that were unseen during the training step. Several morphological and spectral disease symptoms, including those akin to the widely used ABCD rule for human skin-cancer detection, i.e., asymmetry (A), border irregularity (B), colour variance (C), and diameter (D), are learned. To the best of our knowledge, this is the first study to incorporate these ABCD features in plant disease detection. We further extract visual and texture features using the grey level co-occurrence matrix (GLCM) and fuse them with the ABCD features. For the coffee dataset, our method achieved 82+% detection accuracy on the severity classification task. The results demonstrate the performance of the proposed method in detecting plant diseases and estimating their severity.
Journal article
A semi-supervised approach for classifying insect developmental phases from repurposed IP102
Published 2026
Computers and electronics in agriculture, 242, 111337
Identifying insect pests, whether as adults, larvae, or eggs, is critical in pest management. Computational learning algorithms have demonstrated strong potential in achieving high identification performance, but these methods typically require large, balanced, and well-annotated datasets. This creates a challenge for insect pest identification, as rare species, despite often being the most damaging to crops, are underrepresented in available datasets. Moreover, annotating large-scale datasets is both costly and labour-intensive. To address this issue, we develop a semi-supervised learning approach, Cost-Focal FixMatch, which extends the widely used FixMatch framework by integrating class-aware reweighting and focal loss to better handle class imbalance. Specifically, we introduce a simple yet robust method for applying class weighting in cross-entropy and focal loss functions. The proposed method generates higher-quality pseudo labels compared to the baseline, ensuring better learning. We evaluate our approach using a repurposed IP102 dataset, which comprises four primary insect life stages, and a mixed IP102 dataset, where the class labels jointly represent insect species and their corresponding life stages. Our method considerably improves the classification of minority classes, achieving a notable increase in recall for the Larva class from 64% under the baseline FixMatch to 82% using MobileNetV3Small backbone. On the Mixed IP102 dataset, our approach achieves almost 9% better improved average recall than the baseline FixMatch built upon the EfficientNetV2S network.
Journal article
TransLIME: Towards transfer explainability to explain black-box models on tabular datasets
Published 2026
Information sciences, 730, 122891
Explainable Artificial Intelligence methods have gained significant traction for their ability to elucidate the decision-making processes of black-box models, particularly in high-stakes fields such as healthcare and finance. Among these, Local Interpretable Model-agnostic Explanations (LIME) stands out as a widely adopted post-hoc, model-agnostic approach that interprets black-box predictions by constructing an interpretable surrogate model on perturbed instances to approximate the local behavior of the original model around a given instance. However, the effectiveness of LIME can depend on the quality of the training data used by the black-box model. When trained on limited or low-quality data, the black-box model may yield inaccurate predictions for perturbed samples, resulting in poorly defined local decision boundaries and consequently unreliable explanations. This limitation is especially problematic in data-scarce settings. To overcome this challenge, we propose TransLIME, a novel end-to-end explainable transfer learning framework that improves the local fidelity and stability of LIME on limited tabular datasets by transferring relevant explainability knowledge from a related auxiliary source domain with a shifted distribution. Also, in TransLIME, only representative source prototype explanations obtained through clustering are transferred to the target domain, thereby reducing cross-domain exposure of both data and explanatory information during transfer. Experimental evaluations on real-world datasets demonstrate the effectiveness of the proposed framework in improving explanation quality in target domains with limited data.
Preprint
Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals
Posted to a preprint site 22/08/2025
ArXiv.org
Reconstructing the 3D geometry, pose, and motion of animals is a long-standing problem, which has a wide range of applications, from biology, livestock management, and animal conservation and welfare to content creation in digital entertainment and Virtual/Augmented Reality (VR/AR). Traditionally, 3D models of real animals are obtained using 3D scanners. These, however, are intrusive, often prohibitively expensive, and difficult to deploy in the natural environment of the animals. In recent years, we have seen a significant surge in deep learning-based techniques that enable the 3D reconstruction, in a non-intrusive manner, of the shape and motion of dynamic objects just from their RGB image and/or video observations. Several papers have explored their application and extension to various types of animals. This paper surveys the latest developments in this emerging and growing field of research. It categorizes and discusses the state-of-the-art methods based on their input modalities, the way the 3D geometry and motion of animals are represented, the type of reconstruction techniques they use, and the training mechanisms they adopt. It also analyzes the performance of some key methods, discusses their strengths and limitations, and identifies current challenges and directions for future research.
Conference proceeding
Normal-guided Detail-Preserving Neural Implicit Function for High-Fidelity 3D Surface Reconstruction
Published 22/05/2025
Proceedings of the ACM on computer graphics and interactive techniques, 8, 1, 12
Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse multi-view RGB images of the objects of interest are available. This paper shows that training neural representations with first-order differential properties (surface normals) leads to highly accurate 3D surface reconstruction, even with as few as two RGB images. Using input RGB images, we compute approximate ground-truth surface normals from depth maps produced by an off-the-shelf monocular depth estimator. During training, we directly locate the surface point of the SDF network and supervise its normal with the one estimated from the depth map. Extensive experiments demonstrate that our method achieves state-of-the-art reconstruction accuracy with a minimal number of views, capturing intricate geometric details and thin structures that were previously challenging to capture. The source code and additional results are available at https://graphics-research-group.github.io/sn-nir.
Dataset
Developing an AI model to detect the Asian House Gecko
Published 2025
We propose a methodology using AI techniques involving image classification and deep learning, to train a model on IBM’s Vision platform, in identifying a gecko species, Hemidactylus frenatus, or the Asian House gecko, as part of biosecurity surveillance and conservation efforts. The dataset contains the images used to train this AI model.
Journal article
Published 2025
IEEE access, 13, 141313 - 141327
This paper explores the effectiveness-specifically in improving video consistency-and the computational burden of Contrastive Language-Image Pre-Training (CLIP) embeddings in video generation. The investigation is conducted using the Stable Video Diffusion (SVD) framework, a state-of-the-art method for generating high-quality videos from image inputs. The diffusion process in SVD generates videos by iteratively denoising noisy inputs over multiple steps. Our analysis reveals that employing CLIP in the cross-attention mechanism at every step of this denoising process has limited impact on maintaining subject and background consistency while imposing a significant computational burden on the video generation network. To address this, we propose Video Computation Cut (VCUT), a novel, training-free optimization method that significantly reduces computational demands without compromising output quality. VCUT replaces the computationally intensive temporal cross-attention with a one-time computed linear layer, cached and reused across inference steps. This innovation reduces up to 322T MACs per 25-frame video, decreases model parameters by 50M, and cuts latency by 20% compared to baseline methods. By streamlining the SVD architecture, our approach makes high-quality video generation more accessible, cost-effective, and eco-friendly, paving the way for real-time applications in telemedicine, remote learning, and automated content creation.
Conference proceeding
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Published 2025
Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online), 21783 - 21792
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10/06/2025–17/06/2025
We propose a novel framework for the statistical analysis of genus-zero 4D surfaces, i.e., 3D surfaces that deform and evolve over time. This problem is particularly challenging due to the arbitrary parameterizations of these surfaces and their varying deformation speeds, necessitating effective spatiotemporal registration. Traditionally, 4D surfaces are discretized, in space and time, before computing their spatiotemporal registrations, geodesics, and statistics. However, this approach may result in suboptimal solutions and, as we demonstrate in this paper, is not necessary. In contrast, we treat 4D surfaces as continuous functions in both space and time. We introduce Dynamic Spherical Neural Surfaces (D-SNS), an efficient smooth and continuous spatiotemporal representation for genus-0 4D surfaces. We then demonstrate how to perform core 4D shape analysis tasks such as spatiotemporal registration, geodesics computation, and mean 4D shape estimation, directly on these continuous representations without upfront discretization and meshing. By integrating neural representations with classical Riemannian geometry and statistical shape analysis techniques, we provide the building blocks for enabling full functional shape analysis. We demonstrate the efficiency of the framework on 4D human and face datasets. The source code and additional results are available at https://4d-dsns.github.io/DSNS/.
Journal article
Deep learning-based analysis of insect life stages using a repurposed dataset
Published 2025
Ecological informatics, 90, 103202
Insect pests pose a significant risk to agriculture and biosecurity, reducing crop yields and requiring effective management. Accurate identification of early life stages is often required for effective management but is generally reliant on expert evaluation, which is both costly and time-consuming. To address this, we use a deep learning-based approach for insect species and life-stage classification from digital images. We repurposed the IP102 dataset by adding detailed annotations for four life stages — egg, larva, pupa, and adult — alongside the original species categories. Two deep learning models, based on ResNet50 and EfficientNetV2M, were tested for classification accuracy in this dual-layered identification task. Although both models accomplished the task well, the EfficientNetV2M model performed slightly better than the ResNet50, achieving 72.4% precision, 72.1% recall, and an F1-score of 72.0%. Our results demonstrate the potential of deep learning for automated insect species and life-stage classification, providing a high throughput and efficient solution towards agricultural monitoring and pest management.
Journal article
Generalized Closed-Form Formulae for Feature-Based Subpixel Alignment in Patch-Based Matching
Published 2025
International journal of computer vision
Patch-based matching is a technique meant to measure the disparity between pixels in a source and target image and is at the core of various methods in computer vision. When the subpixel disparity between the source and target images is required, the cost function or the target image has to be interpolated. While cost-based interpolation is easier to implement, multiple works have shown that image-based interpolation can increase the accuracy of the disparity estimate. In this paper we review closed-form formulae for subpixel disparity computation for one dimensional matching, e.g., rectified stereo matching, for the standard cost functions used in patch-based matching. We then propose new formulae to generalize to high-dimensional search spaces, which is necessary for unrectified stereo matching and optical flow. We also compare the image-based interpolation formulae with traditional cost-based formulae, and show that image-based interpolation brings a significant improvement over the cost-based interpolation methods for two dimensional search spaces, and small improvement in the case of one dimensional search spaces. The zero-mean normalized cross correlation cost function is found to be preferable for subpixel alignment. A new error model, based on very broad assumptions is outlined in the Supplementary Material to demonstrate why these image-based interpolation formulae outperform their cost-based counterparts and why the zero-mean normalized cross correlation function is preferable for subpixel alignement.