SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays

Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.

Abstract

Motivation: The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype.

Results: We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits.

Availability: SNiPer-HD is available at TGen's website: http://www.tgen.org/neurogenomics/data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Chromosome Mapping
  • Computational Biology / methods*
  • Databases, Genetic
  • False Positive Reactions
  • Gene Expression Profiling
  • Genotype
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Multigene Family
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA
  • White People / genetics