Logo image
Synthetic tabular health data generation: a practical comparison between correlation and model-based statistical approach and the conditional generative adversarial network approach
Journal article   Open access   Peer reviewed

Synthetic tabular health data generation: a practical comparison between correlation and model-based statistical approach and the conditional generative adversarial network approach

Yunwei Zhang and Samuel Muller
Journal of statistical computation and simulation
2026
pdf
adversarial network8.87 MBDownloadView
Open Access CC BY-NC-ND V4.0

Abstract

Computer Science Computer Science, Interdisciplinary Applications Mathematics Physical Sciences Science & Technology Statistics & Probability Technology
{A statistical perspective on synthetic health tabular data generation: comparing statistical approach with conditional generative adversarial network approach.} Synthetic datasets are vital in various areas of health, including sharing sensitive human data, protecting patient's privacy and validating prediction model performance with limited sample size. While generating synthetic data for these purposes is not new, statistical data simulation approaches have traditionally been used before the development of generative adversarial networks. Will statistical methods in this context become less relevant? Which of these two approaches is better when learning from health data? With these questions in mind, we aim to review existing synthetic tabular health data generation approaches, to empirically compare on real-world datasets, and to ultimately provide practical guidance on choices of methods. Our empirical study reveals that either technique generates synthetic datasets that closely resemble the real data structure and that contribute to evaluating prediction model performances.

Details

Metrics

1 Record Views
Logo image