PeSV-Fisher: identification of somatic and non-somatic structural variants using next generation sequencing data

PLoS One. 2013 May 21;8(5):e63377. doi: 10.1371/journal.pone.0063377. Print 2013.

Abstract

Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions.

Availability: PeSV-Fisher is available at http://gd.crg.eu/tools.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology
  • Databases, Genetic*
  • Genome, Human / genetics
  • Genomic Structural Variation / genetics*
  • Humans
  • Leukemia, Lymphocytic, Chronic, B-Cell / genetics
  • Sequence Analysis, DNA / methods*

Grants and funding

This work was supported by AGAUR (Generalitat de Catalunya, 2009 SGR 1502) (X.E.); CIBERESP (Instituto de Salud Carlos III) (G.E.); ESGI (European Commission, 262055_ESGI) (R.R., X.E.), ENGAGE (European Commission, ENGAGE_201413), TECHGENE (European Commission, TECHGENE_223143), and GEUVADIS (European Commission, 261123_GEUVADIS) (X.E.); NOVADIS (Ministerio de Ciencia y Technologia, SAF2008-00357) (X.E.); Galicia Government Xunta de Galicia (Spain) through the project number 10PXIB918057 (J.M.C.T.); MAEC-AEC1 Predoctoral Fellowship (Ministerio de Asuntos Exteriores y Cooperación, Spain) (A.M.F.); and Ramón y Cajal position and grant BFU2007-60930 (Ministerio de Educación y Ciencia) (M.C.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.