Background: Human exome sequencing is a recently developed tool to aid in the discovery of novel coding variants. Now broadly applied, exome sequencing data sets provide a novel opportunity to evaluate the allele frequencies of previously published pathogenic rare variants.
Methods and results: We examined the exome data set from the National Heart, Lung and Blood Institute Exome Sequencing Project and compared this data set with a catalog of 197 previously published rare variants reported as causative of dilated cardiomyopathy (DCM) from familial and sporadic cases. Of these 197, 33 (16.8%) were also present in the Exome Sequencing Project database, raising the question of whether they were uncommon polymorphisms. Supporting functional data has been published for 14 of the 33 (42%), suggesting they are unlikely to be false-positives. The frequencies of these functional variants in the Exome Sequencing Project data set ranged from 0.02 to 1.33% (median 0.04%), which when applied as a cutoff to filter variants in a DCM pedigree identified an additional DCM candidate gene. A greater proportion of sporadic DCM cases had variants that were present in the Exome Sequencing Project data set versus novel variants (ie, not in the Exome Sequencing Project; 44% versus 21%; P=0.002), suggesting some of the variants identified as disease causing in sporadic DCM are either false-positives or low penetrance alleles in human populations.
Conclusions: Rare nonsynonymous variants identified in DCM subjects also present at very low frequencies in public databases are likely relevant for DCM. Allele frequencies >0.04% are of less certain pathogenicity, especially if identified in sporadic cases, although this cutoff should be viewed as preliminary.