Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data

Eur J Hum Genet. 2007 Oct;15(10):1063-70. doi: 10.1038/sj.ejhg.5201875. Epub 2007 Jun 13.

Abstract

HapMap provides linkage disequilibrium (LD) information on a sample of 3.7 million SNPs that can be used for tag SNP selection in whole-genome association studies. HapMap can also be used for tag SNP selection in candidate genes, although its performance has yet to be evaluated against gene resequencing data, where there is near-complete SNP ascertainment. The Environmental Genome Project (EGP) is the largest gene resequencing effort to date with over 500 resequenced genes. We used HapMap data to select tag SNPs and calculated the proportions of common SNPs (MAF>or=0.05) tagged (rho2>or=0.8) for each of 127 EGP Panel 2 genes where individual ethnic information was available. Median gene-tagging proportions are 50, 80 and 74% for African, Asian, and European groups, respectively. These low gene-tagging proportions may be problematic for some candidate gene studies. In addition, although HapMap targeted nonsynonymous SNPs (nsSNPs), we estimate only approximately 30% of nonsynonymous SNPs in EGP are in high LD with any HapMap SNP. We show that gene-tagging proportions can be improved by adding a relatively small number of tag SNPs that were selected based on resequencing data. We also demonstrate that ethnic-mixed data can be used to improve HapMap gene-tagging proportions, but are not as efficient as ethnic-specific data. Finally, we generalized the greedy algorithm proposed by Carlson et al (2004) to select tag SNPs for multiple populations and implemented the algorithm into a freely available software package mPopTag.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Chromosome Mapping
  • Ethnicity / genetics
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Human Genome Project
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA
  • Sequence Tagged Sites