Logo image
A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media
Journal article   Open access   Peer reviewed

A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media

Kok Kiang Long, Stephen Wai Hang Kwok, Jayne Kotz and Guanjin Wang
Computers in biology and medicine, Vol.164, 107232
2023
pdf
Published1.99 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Abstract

Imbalanced learning Multiview learning Stacked architecture Support vector machines Tweets data
Social media platforms such as Twitter are home ground for rapid COVID-19-related information sharing over the Internet, thereby becoming the favorable data resource for many downstream applications. Due to the massive pile of COVID-19 tweets generated every day, it is significant that the machine-learning-supported downstream applications can effectively skip the uninformative tweets and only pick up the informative tweets for their further use. However, existing solutions do not specifically consider the negative effect caused by the imbalanced ratios between informative and uninformative tweets in training data. In particular, most of the existing solutions are dominated by single-view learning, neglecting the rich information from different views to facilitate learning. In this study, a novel deep imbalanced multi-view learning approach called D-SVM-2K is proposed to identify the informative COVID-19 tweets from social media. This approach is built upon the well-known multiview learning method SVM-2K to incorporate different views generated from different feature extraction techniques. To battle against the class imbalance problem and enhance its learning ability, D-SVM-2K stacks multiple SVM-2K base classifiers in a stacked deep structure where its base classifiers can learn from either the original training dataset or the shifted critical regions identified using the well-known k-nearest neighboring algorithm. D-SVM-2K also realises a global and local deep ensemble learning on the multiple views’ data. Our empirical experiments on a real-world labeled tweet dataset demonstrate the effectiveness of D-SVM-2K in dealing with the real-world multi-view class imbalance issues. •The proposed approach, D-SVM-2K, addresses class imbalance in multi-view learning at the algorithm level, using a deep stacked architecture and oversampling SVM-2K base classifiers.•D-SVM-2K achieves enhanced generalization performance by applying the stacked generalization principle and conducting global and local deep ensemble learning on each view of the training dataset.•D-SVM-2K focuses on learning critical regions within the augmented feature space at each layer, enabling thorough examination and classification of challenging regions by SVM-2K base classifiers.•D-SVM-2K demonstrates its effectiveness in addressing imbalanced learning in real-world scenarios, offering improved classification performance for multi-view imbalanced learning in most cases.

Details

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

Metrics

5 File views/ downloads
189 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics
4 Electrical Engineering, Electronics & Computer Science
4.61 Artificial Intelligence & Machine Learning
4.61.145 Classification Algorithms
Web Of Science research areas
Biology
Computer Science, Interdisciplinary Applications
Engineering, Biomedical
Mathematical & Computational Biology
ESI research areas
Computer Science
Logo image