Imbalanced learning Multiview learning Stacked architecture Support vector machines Tweets data
Social media platforms such as Twitter are home ground for rapid COVID-19-related information sharing over the Internet, thereby becoming the favorable data resource for many downstream applications. Due to the massive pile of COVID-19 tweets generated every day, it is significant that the machine-learning-supported downstream applications can effectively skip the uninformative tweets and only pick up the informative tweets for their further use. However, existing solutions do not specifically consider the negative effect caused by the imbalanced ratios between informative and uninformative tweets in training data. In particular, most of the existing solutions are dominated by single-view learning, neglecting the rich information from different views to facilitate learning. In this study, a novel deep imbalanced multi-view learning approach called D-SVM-2K is proposed to identify the informative COVID-19 tweets from social media. This approach is built upon the well-known multiview learning method SVM-2K to incorporate different views generated from different feature extraction techniques. To battle against the class imbalance problem and enhance its learning ability, D-SVM-2K stacks multiple SVM-2K base classifiers in a stacked deep structure where its base classifiers can learn from either the original training dataset or the shifted critical regions identified using the well-known k-nearest neighboring algorithm. D-SVM-2K also realises a global and local deep ensemble learning on the multiple views’ data. Our empirical experiments on a real-world labeled tweet dataset demonstrate the effectiveness of D-SVM-2K in dealing with the real-world multi-view class imbalance issues.
•The proposed approach, D-SVM-2K, addresses class imbalance in multi-view learning at the algorithm level, using a deep stacked architecture and oversampling SVM-2K base classifiers.•D-SVM-2K achieves enhanced generalization performance by applying the stacked generalization principle and conducting global and local deep ensemble learning on each view of the training dataset.•D-SVM-2K focuses on learning critical regions within the augmented feature space at each layer, enabling thorough examination and classification of challenging regions by SVM-2K base classifiers.•D-SVM-2K demonstrates its effectiveness in addressing imbalanced learning in real-world scenarios, offering improved classification performance for multi-view imbalanced learning in most cases.
Details
Title
A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media
Authors/Creators
Kok Kiang Long - Murdoch University
Stephen Wai Hang Kwok - Harry Butler Institute, Murdoch University, Perth, Australia
Jayne Kotz - Murdoch University, Ngangk Yira Institute for Change
Guanjin Wang - Murdoch University, School of Information Technology
Publication Details
Computers in biology and medicine, Vol.164, 107232