A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media

Kok Kiang Long; Stephen Wai Hang Kwok; Jayne Kotz; Guanjin Wang

doi:10.1016/j.compbiomed.2023.107232

Back

A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media

Journal article

Open access

Peer reviewed

A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media

Kok Kiang Long, Stephen Wai Hang Kwok, Jayne Kotz and Guanjin Wang

Computers in biology and medicine, Vol.164, 107232

2023

DOI: https://doi.org/10.1016/j.compbiomed.2023.107232

Appears in Open Access via Read & Publish Agreements

Files and links (1)

pdf

Published1.99 MBDownload View

CC BY-NC-ND V4.0, Open Access

Abstract

Imbalanced learning

Multiview learning

Stacked architecture

Support vector machines

Tweets data

Social media platforms such as Twitter are home ground for rapid COVID-19-related information sharing over the Internet, thereby becoming the favorable data resource for many downstream applications. Due to the massive pile of COVID-19 tweets generated every day, it is significant that the machine-learning-supported downstream applications can effectively skip the uninformative tweets and only pick up the informative tweets for their further use. However, existing solutions do not specifically consider the negative effect caused by the imbalanced ratios between informative and uninformative tweets in training data. In particular, most of the existing solutions are dominated by single-view learning, neglecting the rich information from different views to facilitate learning. In this study, a novel deep imbalanced multi-view learning approach called D-SVM-2K is proposed to identify the informative COVID-19 tweets from social media. This approach is built upon the well-known multiview learning method SVM-2K to incorporate different views generated from different feature extraction techniques. To battle against the class imbalance problem and enhance its learning ability, D-SVM-2K stacks multiple SVM-2K base classifiers in a stacked deep structure where its base classifiers can learn from either the original training dataset or the shifted critical regions identified using the well-known k-nearest neighboring algorithm. D-SVM-2K also realises a global and local deep ensemble learning on the multiple views’ data. Our empirical experiments on a real-world labeled tweet dataset demonstrate the effectiveness of D-SVM-2K in dealing with the real-world multi-view class imbalance issues. •The proposed approach, D-SVM-2K, addresses class imbalance in multi-view learning at the algorithm level, using a deep stacked architecture and oversampling SVM-2K base classifiers.•D-SVM-2K achieves enhanced generalization performance by applying the stacked generalization principle and conducting global and local deep ensemble learning on each view of the training dataset.•D-SVM-2K focuses on learning critical regions within the augmented feature space at each layer, enabling thorough examination and classification of challenging regions by SVM-2K base classifiers.•D-SVM-2K demonstrates its effectiveness in addressing imbalanced learning in real-world scenarios, offering improved classification performance for multi-view imbalanced learning in most cases.

Details

Title: A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media
Authors/Creators: Kok Kiang Long - Murdoch University
Stephen Wai Hang Kwok - Harry Butler Institute, Murdoch University, Perth, Australia
Jayne Kotz - Murdoch University, Ngangk Yira Institute for Change
Guanjin Wang - Murdoch University, School of Information Technology
Publication Details: Computers in biology and medicine, Vol.164, 107232
Publisher: Elsevier Ltd
Identifiers: 991005595769807891
Murdoch Affiliation: Harry Butler Institute; Ngangk Yira Institute for Change; School of Information Technology
Language: English
Resource Type: Journal article

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

Metrics

5 File views/ downloads

189 Record Views

1 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.61 Artificial Intelligence & Machine Learning; 4.61.145 Classification Algorithms
Web Of Science research areas: Biology; Computer Science, Interdisciplinary Applications; Engineering, Biomedical; Mathematical & Computational Biology
ESI research areas: Computer Science