Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes

Am J Hum Genet. 2024 Sep 5;111(9):2031-2043. doi: 10.1016/j.ajhg.2024.07.018. Epub 2024 Aug 21.

Abstract

In silico variant effect predictions are available for nearly all missense variants but played a minimal role in clinical variant classification because they were deemed to provide only supporting evidence. Recently, the ClinGen Sequence Variant Interpretation (SVI) Working Group updated recommendations for variant effect prediction use. By analyzing control pathogenic and benign variants across all genes, they were able to compute evidence strength for predictor score intervals with some intervals generating moderate, strong, or even very strong evidence. However, this genome-wide approach could obscure heterogeneous predictor performance in different genes. We quantified the gene-by-gene performance of two top predictors, REVEL and BayesDel, by analyzing control variants in each predictor score interval in 3,668 disease-relevant genes. Approximately 10% of intervals had sufficient control variants for analysis, and ∼70% of these intervals exceeded the maximum number of incorrect predictions implied by the SVI recommendations. These trending discordant intervals arose owing to the divergence of the gene-specific distribution of predictions from the genome-wide distribution, suggesting that gene-specific calibration is needed in many cases. Approximately 22% of ClinVar missense variants of uncertain significance in genes we analyzed (REVEL = 100,629, BayesDel = 71,928) had predictions in trending discordant intervals. Thus, genome-wide calibrations could result in many variants receiving inappropriate evidence strength. To facilitate a review of the SVI's calibrations, we developed a web application enabling visualization of gene-specific predictions and trending concordant and discordant intervals.

Keywords: calibrations; variant effect predictors; variant interpretation.

MeSH terms

  • Calibration
  • Databases, Genetic
  • Genetic Variation
  • Genome, Human
  • Genome-Wide Association Study* / methods
  • Humans
  • Mutation, Missense
  • Software