Generating pregnant patient biological profiles by deconvoluting clinical records with electronic health record foundation models

David Seong; Samson Mataraso; Camilo Espinosa; Eloise Berson; S Momsen Reincke; Lei Xue; Chloe Kashiwagi; Yeasul Kim; Chi-Hung Shu; Philip Chung; Marc Ghanem; Feng Xie; Ronald J Wong; Martin S Angst; Brice Gaudilliere; Gary M Shaw; David K Stevenson; Nima Aghaeepour

doi:10.1093/bib/bbae574

Generating pregnant patient biological profiles by deconvoluting clinical records with electronic health record foundation models

Brief Bioinform. 2024 Sep 23;25(6):bbae574. doi: 10.1093/bib/bbae574.

Authors

David Seong^{1

2

3}, Samson Mataraso^{3

4

5}, Camilo Espinosa^{1

3

4

5}, Eloise Berson^{3

5

6}, S Momsen Reincke^{3

4

5}, Lei Xue^{3

4

5}, Chloe Kashiwagi^{1

3

4}, Yeasul Kim^{3

4

5}, Chi-Hung Shu³, Philip Chung³, Marc Ghanem³, Feng Xie^{3

4

5}, Ronald J Wong⁴, Martin S Angst³, Brice Gaudilliere³, Gary M Shaw⁴, David K Stevenson⁴, Nima Aghaeepour^{1

3

4

5}

Affiliations

¹ Immunology Program, Stanford University School of Medicine, 240 Pasteur Drive, Palo Alto CA, 94304, United States.
² Medical Scientist Training Program, Stanford University School of Medicine, 1265 Welch Road, Stanford CA, 94305, United States.
³ Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States.
⁴ Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States.
⁵ Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford CA, 94305, United States.
⁶ Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford CA, 94305, United States.

Abstract

Translational biology posits a strong bi-directional link between clinical phenotypes and a patient's biological profile. By leveraging this bi-directional link, we can efficiently deconvolute pre-existing clinical information into biological profiles. However, traditional computational tools are limited in their ability to resolve this link because of the relatively small sizes of paired clinical-biological datasets for training and the high dimensionality/sparsity of tabular clinical data. Here, we use state-of-the-art foundation models (FMs) for electronic health record (EHR) data to generate proteomics profiles of pregnant patients, thereby deconvoluting pre-existing clinical information into biological profiles without the cost and effort of running large-scale traditional omics studies. We show that FM-derived representations of a patient's EHR data coupled with a fully connected neural network prediction head can generate 206 blood protein expression levels. Interestingly, these proteins were enriched for developmental pathways, while proteins not able to be generated from EHR data were enriched for metabolic pathways. Finally, we show a proteomic signature of gestational diabetes that includes proteins with established and novel links to gestational diabetes. These results showcase the power of FM-derived EHR representations in efficiently generating biological states of pregnant patients. This capability can revolutionize disease understanding and therapeutic development, offering a cost-effective, time-efficient, and less invasive alternative to traditional methods of generating proteomics.

Keywords: electronic health record; foundation model; machine learning; pregnancy; proteomics.

MeSH terms

Computational Biology / methods
Diabetes, Gestational / metabolism
Electronic Health Records*
Female
Humans
Neural Networks, Computer
Pregnancy
Proteomics* / methods

Abstract

MeSH terms

Grants and funding