A systematic review of multi-modal large language models on domain-specific applications

Sirui Li; Kok Wai Wong; Guanjin Wang; Thach-Thao Duong

doi:10.1007/s10462-025-11398-1

Back

A systematic review of multi-modal large language models on domain-specific applications

Journal article

Open access

Peer reviewed

A systematic review of multi-modal large language models on domain-specific applications

Sirui Li, Kok Wai Wong, Guanjin Wang and Thach-Thao Duong

The Artificial intelligence review, Vol.58(12), 383

2025

DOI: https://doi.org/10.1007/s10462-025-11398-1

Appears in Open Access via Read & Publish Agreements

Files and links (1)

pdf

Published5.34 MBDownload View

CC BY V4.0, Open Access

Abstract

Article

Artificial Intelligence

Computer Science

General

While Large Language Models (LLMs) have shown remarkable proficiency in text-based tasks, they struggle to interact effectively with the more realistic world without the perceptions of other modalities such as visual and audio. Multi-modal LLMs, which integrate these additional modalities, have become increasingly important across various domains. Despite the significant advancements and potential of multi-modal LLMs, there has been no comprehensive PRISMA-based systematic review that examines their applications across different domains. The objective of this work is to fill this gap by systematically reviewing and synthesising the quantitative research literature on domain-specific applications of multi-modal LLMs. This systematic review follows the PRISMA guidelines to analyse research literature published after 2022, the release of OpenAI’s ChatGPT 3.5. The literature search was conducted across several online databases, including Nature, Scopus, and Google Scholar. A total of 22 studies were identified, with 11 focusing on the medical domain, 3 on autonomous driving, and 2 on geometric analysis. The remaining studies covered a range of topics, with one each on climate, music, e-commerce, sentiment analysis, human-robot interaction, and construction. This review provides a comprehensive overview of the current state of multi-modal LLMs, highlights their domain-specific applications, and identifies gaps and future research directions.

Details

Title: A systematic review of multi-modal large language models on domain-specific applications
Authors/Creators: Sirui Li - Murdoch University, School of Information Technology
Kok Wai Wong - Murdoch University, Centre for Water, Energy and Waste
Guanjin Wang - Murdoch University, School of Information Technology
Thach-Thao Duong - Murdoch University, School of Information Technology
Publication Details: The Artificial intelligence review, Vol.58(12), 383
Publisher: Springer Netherlands
Number of pages: 47
Grant note: Murdoch University
Identifiers: 991005821345207891
Murdoch Affiliation: School of Information Technology; Centre for Water, Energy and Waste
Language: English
Resource Type: Journal article

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Source: InCites

Metrics

473 File views/ downloads

34 Record Views

2 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.17 Computer Vision & Graphics; 4.17.128 Deep Visual Recognition
Web Of Science research areas: Computer Science, Artificial Intelligence
ESI research areas: Computer Science