Logo image
Evaluation of imputation strategies for multi-centre studies: Application to a large clinical pathology dataset
Journal article   Open access   Peer reviewed

Evaluation of imputation strategies for multi-centre studies: Application to a large clinical pathology dataset

Lucy Grigoroff, Reika Masuda, John Lindon, Janonna Kadyrov, Jeremy K Nicholson, Elaine Holmes and Julien Wist
PloS one, Vol.20(11), e0335852
2025
PMID: 41264609
pdf
Published2.35 MBDownloadView
Published (Version of Record)CC BY V4.0 Open Access

Abstract

As part of a strategy for accommodating missing data in large heterogeneous datasets, two Random Forest-based (RF) imputation methods, missForest and MICE were evaluated along with several strategies to help navigate the inherently incomplete structure of the dataset. Background: A total of 3817 complete cases of clinical chemistry variables from a large-scale, multi-site preclinical longitudinal pathology study were used as an evaluation dataset. Three types of ‘missingness’ in various proportions were artificially introduced to compare imputation performance for different strategies including variable inclusion and stratification. Results: MissForest was found to outperform MICE, being robust and capable of automatic variable selection. Stratification had minimal effect on missForest but severely deteriorated the performance of MICE. Conclusion: In general, storing and sharing datasets prior to any correction is a good practise, so that imputation can be performed on merged data if necessary.

Details

UN Sustainable Development Goals (SDGs)

This output has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

Metrics

1 File views/ downloads
9 Record Views
Logo image