Logo image
An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling
Journal article   Open access   Peer reviewed

An AUC-maximizing classifier for skewed and partially labeled data with an application in clinical prediction modeling

Guanjin Wang, Stephen Wai Hang Kwok, Daniel Axford, Mohammed Yousufuddin and Ferdous Sohel
Knowledge-based systems, Vol.278, 110831
2023
pdf
Published915.21 kBDownloadView
CC BY-NC-ND V4.0 Open Access

Abstract

Partially labeled and skewed datasets are common in many applications including healthcare, due to the high costs and time constraints of data collection and annotation. However, training machine learning classifiers on such data can undermine their prediction performances. In this paper, we propose a novel classifier to address this problem by focusing on the Area Under the Curve (AUC), which is widely recognized as a more robust performance metric for skewed datasets than other metrics such as accuracy and error rate. We introduce a new classifier called PSVM-AUC Maximizer (PSVM-AUCMax) which is based on Proximal Support Vector Machines (PSVM) and directly maximizes a new AUC-based metric in its learning objective. PSVM-AUCMax has several merits. First, by directly integrating the maximization of the proposed AUC-based metric, PSVM-AUCMax can be proved to have the enhanced generalization capability on the partially labeled and skewed dataset. Second, it simplifies the model selection process with fewer tuning hyperparameters. Third, PSVM-AUCMax’s analytical solution remains the same form as traditional PSVM, preserving its advantages such as fast incremental updating in incremental learning scenarios. The efficacy of PSVM-AUCMax has been demonstrated through extensive experiments on several public datasets and a healthcare case study using data collected at the US Mayo Clinic. In the healthcare case study, we utilized PSVM-AUCMax to develop a clinical prediction model for forecasting composite outcomes in hospitalized COVID-19 patients which yielded promising results.

Details

Metrics

80 File views/ downloads
87 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types
International collaboration
Citation topics
4 Electrical Engineering, Electronics & Computer Science
4.61 Artificial Intelligence & Machine Learning
4.61.145 Classification Algorithms
Web Of Science research areas
Computer Science, Artificial Intelligence
ESI research areas
Computer Science
Logo image