Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Naeimeh Atabaki-Pasdar; Mattias Ohlsson; Ana Viñuela; Francesca Frau; Hugo Pomares-Millan; Mark Haid; Angus G Jones; E Louise Thomas; Robert W Koivula; Azra Kurbasic; Pascal M Mutie; Hugo Fitipaldi; Juan Fernandez; Adem Y Dawed; Giuseppe N Giordano; Ian M Forgie; Timothy J McDonald; Femke Rutters; Henna Cederberg; Elizaveta Chabanova; Matilda Dale; Federico De Masi; Cecilia Engel Thomas; Kristine H Allin; Tue H Hansen; Alison Heggie; Mun-Gwan Hong; Petra J M Elders; Gwen Kennedy; Tarja Kokkola; Helle Krogh Pedersen; Anubha Mahajan; Donna McEvoy; Francois Pattou; Violeta Raverdy; Ragna S Häussler; Sapna Sharma; Henrik S Thomsen; Jagadish Vangipurapu; Henrik Vestergaard; Leen M 't Hart; Jerzy Adamski; Petra B Musholt; Soren Brage; Søren Brunak; Emmanouil Dermitzakis; Gary Frost; Torben Hansen; Markku Laakso; Oluf Pedersen; Martin Ridderstråle; Hartmut Ruetten; Andrew T Hattersley; Mark Walker; Joline W J Beulens; Andrea Mari; Jochen M Schwenk; Ramneek Gupta; Mark I McCarthy; Ewan R Pearson; Jimmy D Bell; Imre Pavo; Paul W Franks

doi:10.1371/journal.pmed.1003149

Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

PLoS Med. 2020 Jun 19;17(6):e1003149. doi: 10.1371/journal.pmed.1003149. eCollection 2020 Jun.

Authors

Naeimeh Atabaki-Pasdar¹, Mattias Ohlsson^{2

3}, Ana Viñuela^{4

5

6}, Francesca Frau⁷, Hugo Pomares-Millan¹, Mark Haid⁸, Angus G Jones⁹, E Louise Thomas¹⁰, Robert W Koivula^{1

11}, Azra Kurbasic¹, Pascal M Mutie¹, Hugo Fitipaldi¹, Juan Fernandez¹, Adem Y Dawed¹², Giuseppe N Giordano¹, Ian M Forgie¹², Timothy J McDonald^{9

13}, Femke Rutters¹⁴, Henna Cederberg¹⁵, Elizaveta Chabanova¹⁶, Matilda Dale¹⁷, Federico De Masi¹⁸, Cecilia Engel Thomas¹⁷, Kristine H Allin^{19

20}, Tue H Hansen^{19

21}, Alison Heggie²², Mun-Gwan Hong¹⁷, Petra J M Elders²³, Gwen Kennedy²⁴, Tarja Kokkola²⁵, Helle Krogh Pedersen¹⁹, Anubha Mahajan²⁶, Donna McEvoy²², Francois Pattou²⁷, Violeta Raverdy²⁷, Ragna S Häussler¹⁷, Sapna Sharma^{28

29}, Henrik S Thomsen¹⁶, Jagadish Vangipurapu²⁵, Henrik Vestergaard^{19

30}, Leen M 't Hart^{14

31

32}, Jerzy Adamski^{8

33

34}, Petra B Musholt³⁵, Soren Brage³⁶, Søren Brunak^{18

37}, Emmanouil Dermitzakis^{4

5

6}, Gary Frost³⁸, Torben Hansen^{19

39}, Markku Laakso^{25

40}, Oluf Pedersen¹⁹, Martin Ridderstråle⁴¹, Hartmut Ruetten⁷, Andrew T Hattersley⁹, Mark Walker²², Joline W J Beulens^{14

42}, Andrea Mari⁴³, Jochen M Schwenk¹⁷, Ramneek Gupta¹⁸, Mark I McCarthy^{11

26

44

45}, Ewan R Pearson¹², Jimmy D Bell¹⁰, Imre Pavo⁴⁶, Paul W Franks^{1

47}

Affiliations

¹ Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Malmö, Sweden.
² Computational Biology and Biological Physics Unit, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden.
³ Center for Applied Intelligent Systems Research, Halmstad University, Halmstad, Sweden.
⁴ Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.
⁵ Institute for Genetics and Genomics in Geneva, University of Geneva Medical School, Geneva, Switzerland.
⁶ Swiss Institute of Bioinformatics, Geneva, Switzerland.
⁷ Sanofi-Aventis Deutschland, Frankfurt am Main, Germany.
⁸ Research Unit Molecular Endocrinology and Metabolism, Helmholtz Zentrum München, Neuherberg, Germany.
⁹ Institute of Biomedical and Clinical Science, College of Medicine and Health, University of Exeter, Exeter, United Kingdom.
¹⁰ Research Centre for Optimal Health, School of Life Sciences, University of Westminster, London, United Kingdom.
¹¹ Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom.
¹² Division of Population Health and Genomics, School of Medicine, University of Dundee, Ninewells Hospital, Dundee, United Kingdom.
¹³ Blood Sciences, Royal Devon and Exeter NHS Foundation Trust, Exeter, United Kingdom.
¹⁴ Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, Amsterdam UMC, Amsterdam, the Netherlands.
¹⁵ Department of Endocrinology, Abdominal Centre, Helsinki University Hospital, Helsinki, Finland.
¹⁶ Department of Diagnostic Radiology, Copenhagen University Hospital Herlev Gentofte, Herlev, Denmark.
¹⁷ Affinity Proteomics, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Solna, Sweden.
¹⁸ Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
¹⁹ Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
²⁰ Center for Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark.
²¹ Department of Cardiology and Endocrinology, Slagelse Hospital, Slagelse, Denmark.
²² Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom.
²³ Department of General Practice, Amsterdam Public Health Research Institute, Amsterdam UMC, Amsterdam, the Netherlands.
²⁴ Immunoassay Biomarker Core Laboratory, School of Medicine, University of Dundee, Ninewells Hospital, Dundee, United Kingdom.
²⁵ Internal Medicine, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.
²⁶ Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
²⁷ University of Lille, Inserm, UMR 1190, Translational Research in Diabetes, Department of Endocrine Surgery, CHU Lille, Lille, France.
²⁸ German Center for Diabetes Research, Neuherberg, Germany.
²⁹ Unit of Molecular Epidemiology, Institute of Epidemiology II, Helmholtz Zentrum München, Neuherberg, Germany.
³⁰ Steno Diabetes Center Copenhagen, Gentofte, Denmark.
³¹ Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands.
³² Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands.
³³ Lehrstuhl für Experimentelle Genetik, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Technische Universität München, Freising, Germany.
³⁴ Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
³⁵ Diabetes Division, Research and Development, Sanofi, Frankfurt, Germany.
³⁶ MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.
³⁷ Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
³⁸ Section for Nutrition Research, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom.
³⁹ Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark.
⁴⁰ Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland.
⁴¹ Clinical Pharmacology and Translational Medicine, Novo Nordisk, Søborg, Denmark.
⁴² Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands.
⁴³ Institute of Neuroscience, National Research Council, Padua, Italy.
⁴⁴ NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, United Kingdom.
⁴⁵ OMNI Human Genetics, Genentech, South San Francisco, California, United States of America.
⁴⁶ Eli Lilly Regional Operations, Vienna, Austria.
⁴⁷ Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, United States of America.

Abstract

Background: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.

Methods and findings: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.

Conclusions: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.

Trial registration: ClinicalTrials.gov NCT03814915.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Diabetes Complications / etiology
Fatty Liver / etiology*
Female
Humans
Machine Learning*
Male
Middle Aged
Models, Statistical
Prospective Studies
Reproducibility of Results
Risk Assessment

Associated data

ClinicalTrials.gov/NCT03814915

Abstract

Publication types

MeSH terms

Associated data

Grants and funding