Purpose: Precision health initiatives and reduced sequencing costs are driving large-scale human genome analyses. Genetic variant curation is a bottleneck in clinical applications. The burden of variant curation can be high for newly discovered variants because they are less likely to have undergone previous clinical annotation; the rate of discovery of genetic variants in large clinical populations has not been empirically determined.
Methods: We determined the rate of accrual of unique sequence variants in 90,000 exome sequences. Separate analyses were done for 17,267 autosomal genes and a subset of 74 actionable genes; the effect of relatedness in the cohort was also determined.
Results: Variant discovery showed a nonlinear growth pattern. The rate of unique variant accrual decreased as the database size increased; by 90,000 exomes 97% of all projected coding and splicing variants had been observed. Variants in 74 actionable genes showed a similar pattern. Family relatedness slightly reduced the rate of discovery of unique variants.
Conclusion: The heaviest burden of interpretation for genetic variants occurs early and diminishes as the database size increases. Our data provide a framework for scaling pathogenic genetic variant discovery and curation, a critical element of patient care in the era of precision health.
Keywords: exome sequencing; genomic screening; secondary findings; sequence scaling; variant curation.