Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data

Yulun Jiang; Alberto García-Durán; Idris Bachali Losada; Pascal Girard; Nadia Terranova

doi:10.1007/s10928-024-09935-6

Generative models for synthetic data generation: application to pharmacokinetic/pharmacodynamic data

J Pharmacokinet Pharmacodyn. 2024 Aug 27. doi: 10.1007/s10928-024-09935-6. Online ahead of print.

Authors

Yulun Jiang¹, Alberto García-Durán², Idris Bachali Losada², Pascal Girard², Nadia Terranova³

Affiliations

¹ School of Computer and Communication Science, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland.
² Merck Quantitative Pharmacology, Ares Trading SA (an affiliate of Merck KGaA, Darmstadt, Germany), Lausanne, Switzerland.
³ Merck Quantitative Pharmacology, Ares Trading SA (an affiliate of Merck KGaA, Darmstadt, Germany), Lausanne, Switzerland. nadia.terranova@merckgroup.com.

PMID: 39192091
DOI: 10.1007/s10928-024-09935-6

Abstract

The generation of synthetic patient data that reflect the statistical properties of real data plays a fundamental role in today's world because of its potential to (i) be enable proprietary data access for statistical and research purposes and (ii) increase available data (e.g., in low-density regions-i.e., for patients with under-represented characteristics). Generative methods employ a family of solutions for generating synthetic data. The objective of this research is to benchmark numerous state-of-the-art deep-learning generative methods across different scenarios and clinical datasets comprising patient covariates and several pharmacokinetic/pharmacodynamic endpoints. We did this by implementing various probabilistic models aimed at generating synthetic data, such as the Multi-layer Perceptron Conditioning Generative Adversarial Neural Network (MLP cGAN), Time-series Generative Adversarial Networks (TimeGAN), and a more traditional approach like Probabilistic Autoregressive (PAR). We evaluated their performance by calculating discriminative and predictive scores. Furthermore, we conducted comparisons between the distributions of real and synthetic data using Kolmogorov-Smirnov and Chi-square statistical tests, focusing respectively on covariate and output variables of the models. Lastly, we employed pharmacometrics-related metric to enhance interpretation of our results specific to our investigated scenarios. Results indicate that multi-layer perceptron-based conditional generative adversarial networks (MLP cGAN) exhibit the best overall performance for most of the considered metrics. This work highlights the opportunities to employ synthetic data generation in the field of clinical pharmacology for augmentation and sharing of proprietary data across institutions.

Keywords: Deep learning; Generative methods; Neural networks; Synthetic pharmacokinetic/Pharmacodynamic data; Virtual patients.