Low-pass genome-wide sequencing and variant inference using identity-by-descent in an isolated human population

Genetics. 2012 Feb;190(2):679-89. doi: 10.1534/genetics.111.134874. Epub 2011 Nov 30.

Abstract

Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference.

MeSH terms

  • Algorithms
  • Alleles
  • Cohort Studies
  • Founder Effect
  • Gene Frequency
  • Genome, Human*
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Pacific Islands
  • Polymorphism, Single Nucleotide*
  • Population Groups / genetics*
  • Reproducibility of Results