Molecular group and correlation guided structural learning for multi-phenotype prediction

Brief Bioinform. 2024 Sep 23;25(6):bbae585. doi: 10.1093/bib/bbae585.

Abstract

We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.

Keywords: DNA methylation; association study; cell-type deconvolution; feature selection; genomic grouping structure; multi-type prediction.

MeSH terms

  • Asthma / genetics
  • Computational Biology* / methods
  • CpG Islands
  • DNA Methylation
  • Genomics / methods
  • Humans
  • Linear Models
  • Phenotype*