Output list
Conference paper
Date presented 07/2025
ICMLC & ICWAPR 2025, 12/07/2025–15/07/2025, Bali, Indonesia
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) serves as a key tool for the test of lipophilic substances in laboratory medicine and is widely employed in the analysis of coenzyme Q10 (CoQ10) and 25-hydroxyvitamin D (25OHD). In this paper, fuzzy concept was applied to improve the LC-MS/MS methods used for CoQ10 and 25OHD detection. The focus was placed on selecting the optimal mobile phase for CoQ10 analysis and examining the differences between LC-MS/MS and chemiluminescence immunoassay (CLIA) methods for 25(OH)D measurement. Through screening various organic phase combinations and employing fuzzy inference, the optimal mobile phase ratio for CoQ10 test is determined to be methanol and isopropanol at a ratio of 8:2. Additionally, fuzzy logic was employed to analyze the variations in 25OHD concentrations across different sexes and age groups. The results showed that women aged 30–40 exhibited greater differences in 25(OH)D levels compared to other groups. This study shows that the use of fuzzy concepts can enhance the adaptability and accuracy of LC-MS/MS detection, offering a novel approach to the analysis of lipophilic substances.
Conference paper
DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral
Date presented 2025
, 267 - 274
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 27/07/2025–01/08/2025, Vienna, Austria
Acquiring structured data from domain-specific, image-based documents—such as scanned reports—is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform , designed to address the challenge of extracting structured information from domain-specific, image-based document collections. Our spiral design establishes an iterative cycle in which human annotations train models that progressively require less manual intervention. DocSpiral integrates document format normalization, comprehensive annotation interfaces, evaluation metrics dashboard, and API endpoints for the development of AI / ML models into a unified workflow. Experiments demonstrate that our framework reduces annotation time by at least 41% while showing consistent performance gains across three iterations during model training. By making this annotation platform freely accessible, we aim to lower barriers to AI/ML models development in document processing, facilitating the adoption of large language models in image-based, document-intensive fields such as geoscience and healthcare. The system is freely available at: https://app.ai4wa.com.
Conference proceeding
TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs
Published 2025
Companion Proceedings of the ACM on Web Conference 2025, 797 - 800
WWW '25: The ACM Web Conference 2025, 28/04/2025–02/05/2025, Sydney, NSW
Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with TimelineKGQA, a universal temporal QA generator applicable to any TKGs. The code is available at: https://github.com/PascalSun/TimelineKGQA as an open source Python package.
Journal article
A systematic review of multi-modal large language models on domain-specific applications
Published 2025
The Artificial intelligence review, 58, 12, 383
While Large Language Models (LLMs) have shown remarkable proficiency in text-based tasks, they struggle to interact effectively with the more realistic world without the perceptions of other modalities such as visual and audio. Multi-modal LLMs, which integrate these additional modalities, have become increasingly important across various domains. Despite the significant advancements and potential of multi-modal LLMs, there has been no comprehensive PRISMA-based systematic review that examines their applications across different domains. The objective of this work is to fill this gap by systematically reviewing and synthesising the quantitative research literature on domain-specific applications of multi-modal LLMs. This systematic review follows the PRISMA guidelines to analyse research literature published after 2022, the release of OpenAI’s ChatGPT
3.5. The literature search was conducted across several online databases, including Nature, Scopus, and Google Scholar. A total of 22 studies were identified, with 11 focusing on the medical domain, 3 on autonomous driving, and 2 on geometric analysis. The remaining studies covered a range of topics, with one each on climate, music, e-commerce, sentiment analysis, human-robot interaction, and construction. This review provides a comprehensive overview of the current state of multi-modal LLMs, highlights their domain-specific applications, and identifies gaps and future research directions.
Conference proceeding
Open-Source Large Language Models Excel in Named Entity Recognition
Published 2025
Neural Information Processing (ICONIP 2024), 2295, 313 - 326
Neural Information Processing 31st International Conference (ICONIP 2024), 02/12/2024–06/12/2024, Auckland, New Zealand
Current state-of-the-art Named Entity Recognition (NER) typically involves fine-tuning transformer-based models like BERT or RoBERTa with annotated datasets, posing challenges in annotation cost, model robustness, and data privacy. An emerging approach uses pre-trained Large Language Models (LLMs) such as ChatGPT to extract entities directly with a few or zero examples, achieving performance comparable to fine-tuned models. However, reliance on the close-source commercial LLMs raises cost and privacy concerns. In this work, we investigate open-source LLMs like Llama2 for NER on local consumer-grade GPUs, aiming to significantly reduce costs compared to cloud solutions while ensuring data security. Experimental results demonstrate competitive NER performance, achieving F1 85.37% on the CoNLL03 dataset and can also be generalised to specific domains, such as scientific texts.
Journal article
TriagedMSA: Triaging Sentimental Disagreement in Multimodal Sentiment Analysis
Published 2025
IEEE transactions on affective computing, 16, 3, 1557 - 1569
Existing multimodal sentiment analysis models are effective at capturing sentiment commonalities across different modalities and discerning emotions. However, these models still face significant challenges when analyzing samples with sentiment polarity differences across modalities. Neural networks struggle to process such divergent sentiment samples, particularly when they are scarce within datasets. While larger datasets could help address this limitation, collecting and annotating them is resource-intensive. To overcome this challenge, we propose TriagedMSA, a multimodal sentiment analysis model with triage capability. Our model introduces the Sentiment Disagreement Triage Network, which identifies sentiment disagreement between modalities within a sample. This triage mechanism reduces mutual influence by learning to distinguish between samples of sentiment agreement and disagreement. To process these two sample types, we develop the Sentiment Selection Attention Network and the Sentiment Commonality Attention Network, both of which enhance modality interaction learning. Furthermore, we propose the Adaptive Polarity Detection (APD) algorithm, which ensures the generalizability of our model across different datasets, regardless of whether unimodal labels are available. The APD algorithm adaptively determines sentiment polarity disagreement or agreement between modalities. We conduct experiments on three multimodal sentiment analysis datasets: CMU-MOSI, CMU-MOSEI and CH-SIMS.v2. The results demonstrate that our proposed methodology outperforms existing state-of-the-art approaches.
Preprint
SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration
Posted to a preprint site 2025
ArXiv.org
We present SymbioticRAG, a novel framework that fundamentally reimagines Retrieval-Augmented Generation~(RAG) systems by establishing a bidirectional learning relationship between humans and machines. Our approach addresses two critical challenges in current RAG systems: the inherently human-centered nature of relevance determination and users' progression from "unconscious incompetence" in query formulation. SymbioticRAG introduces a two-tier solution where Level 1 enables direct human curation of retrieved content through interactive source document exploration, while Level 2 aims to build personalized retrieval models based on captured user interactions. We implement Level 1 through three key components: (1)~a comprehensive document processing pipeline with specialized models for layout detection, OCR, and extraction of tables, formulas, and figures; (2)~an extensible retriever module supporting multiple retrieval strategies; and (3)~an interactive interface that facilitates both user engagement and interaction data logging. We experiment Level 2 implementation via a retriever strategy incorporated LLM summarized user intention from user interaction logs. To maintain high-quality data preparation, we develop a human-on-the-loop validation interface that improves pipeline output while advancing research in specialized extraction tasks. Evaluation across three scenarios (literature review, geological exploration, and education) demonstrates significant improvements in retrieval relevance and user satisfaction compared to traditional RAG approaches. To facilitate broader research and further advancement of SymbioticRAG Level 2 implementation, we will make our system openly accessible to the research community.
Conference proceeding
Published 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 46 - 52
Conference on Empirical Methods in Natural Language Processing(EMNLP 2024), 12/11/2024–16/11/2024, Miami, FL
Multimodal conversational agents are highly desirable because they offer natural and human-like interaction. However, there is a lack of comprehensive end-to-end solutions to support collaborative development and benchmark-ing. While proprietary systems like GPT-4o and Gemini demonstrating impressive integration of audio, video, and text with response times of 200-250ms, challenges remain in balancing latency, accuracy, cost, and data privacy. To better understand and quantify these issues, we developed OpenOmni, an open-source, end-to-end pipeline benchmarking tool that integrates advanced technologies such as Speech-to-Text, Emotion Detection, Retrieval Augmented Generation, Large Language Models , along with the ability to integrate cus-tomized models. OpenOmni supports local and cloud deployment, ensuring data privacy and supporting latency and accuracy bench-marking. This flexible framework allows researchers to customize the pipeline, focus-ing on real bottlenecks and facilitating rapid proof-of-concept development. OpenOmni can significantly enhance applications like indoor assistance for visually impaired individuals, advancing human-computer interaction. Our demonstration video is available https://www. youtube.com/watch?v=zaSiT3clWqY, demo is available via https://openomni.ai4wa. com, code is available via https://github. com/AI4WA/OpenOmniFramework.
Conference proceeding
Retinal Image Registration with Haar-Optimized Local Binary Descriptors for Bifurcation Points
Published 2024
2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 745 - 751
International Conference on Digital Image Computing: Techniques and Applications (DICTA) 2024, 27/11/2024–29/11/2024, Perth, WA
This paper introduces a novel method for the registration of color fundus photographs, featuring a new descriptor named Haar-Optimized Local Binary Descriptor (HOLBD). HOLBD is a fast-to-compute and match descriptor, highly optimized to uniquely describe retinal bifurcation and crossover points, which are crucial landmarks for fundus image registration. It utilizes four patterns reminiscent of Haar basis functions, optimized to define these bifurcation and crossover points. These patterns perform pixel intensity tests to form a 340-bit binary vector. Before computing the HOLBD descriptor, the overall image orientation and scaling factors are estimated, and images are normalized, making HOLBD robust against rotation and scaling. Experiments were conducted on both publicly available and private retinal image registration datasets, comprising a total of 484 retinal images (i.e., 242 pairs). The proposed method was compared with state-of-the-art techniques, including Generalized Dual-Bootstrap Iterative Closest Point, Hernandez-Matas et al., Saha et al., and Chen et al.'s methods. Results show that the proposed method outperforms the best performing method. On private dataset, the proposed method achieves 1-3% higher accuracy than the best-performing method for error thresholds up to 15 pixels. It significantly outperforms other methods by 4-30% for error thresholds up to 10 pixels. On the public dataset, the proposed method marginally outperforms the best reported method. It significantly outperforms GDP ICP, Hernandez-Matas et al., and Chen et al. by a margin of 10-40%.
Conference proceeding
Published 2024
Neural Information Processing (ICONIP 2024), 2296, 102 - 117
Neural Information Processing 31st International Conference (ICONIP 2024), 02/12/2024–06/12/2024, Auckland, New Zealand
The transmission of African swine fever (ASF) could be influenced by temperature and rainfall, particularly through the transmission of wild boars. Australia's ASF risk assessment capabilities can be further enhanced by analyzing the impact of temperature and precipitation on ASF. As there are currently no cases of ASF in Australia, this study utilized Poland's ASF-wild boar cases between 2018 and 2021 to establish a risk assessment model for Australia. Two methods were adopted to model the risk by analyzing the correlation between the number of ASF-wild boar cases, and the temperature and rainfall. The two methods used were linear regression and fuzzy inference systems. The aim is to develop a risk assessment analysis that can estimate the seasonal risk of ASF in Australia. The results from the two models showed that there is a significant relationship between the number of cases and the changes in the temperature, but has shown no prominent association with the amount of rainfall. To the best of our knowledge, this is the first model that conducts a seasonal assessment of ASF risk in Australia. The proposed technique used in modelling the Australia’s risk assessment is leading and can handle the incompleteness of data, making this a novel approach that can be used to build models for other countries or regions and also for different infectious diseases.