Detecting copy number variation with mated short reads

Genome Res. 2010 Nov;20(11):1613-22. doi: 10.1101/gr.106344.110. Epub 2010 Aug 30.

Abstract

The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.

Publication types

  • Comparative Study
  • Validation Study

MeSH terms

  • Algorithms
  • Base Pairing / physiology*
  • Base Sequence / physiology
  • Chromosome Breakage
  • Chromosome Mapping / methods*
  • DNA Copy Number Variations* / genetics
  • DNA Mutational Analysis / methods*
  • DNA Shuffling
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Reproducibility of Results