Reliable identification of large numbers of candidate SNPs from public EST data

K H Buetow; M N Edmonson; A B Cassidy

doi:10.1038/6851

Reliable identification of large numbers of candidate SNPs from public EST data

Nat Genet. 1999 Mar;21(3):323-5. doi: 10.1038/6851.

Authors

K H Buetow¹, M N Edmonson, A B Cassidy

Affiliation

¹ Laboratory of Population Genetics, NCI, NIH, Bethesda, Maryland 20892, USA. buetowk@nih.gov

PMID: 10080189
DOI: 10.1038/6851

Abstract

High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi-automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single-nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82% identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.

MeSH terms

Algorithms
Databases, Factual*
Expressed Sequence Tags*
Gene Frequency
Genetic Variation
Genetics, Population
Heterozygote
Humans
Internet
Nucleotides / genetics
Polymerase Chain Reaction
Polymorphism, Restriction Fragment Length*
Reproducibility of Results
Software

Substances

Nucleotides