Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples

F1000Res. 2016 Nov 22:5:2741. doi: 10.12688/f1000research.10082.2. eCollection 2016.

Abstract

Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. Disambiguate operates on DNA or RNA-seq alignments to the two species and separates the components at very high sensitivity and specificity as illustrated in artificially mixed human-mouse samples. This allows for maximum recovery of data from target tumours for more accurate variant calling and gene expression quantification. Given that no general use open source algorithm accessible to the bioinformatics community exists for the purposes of separating the two species data, the proposed Disambiguate tool presents a novel approach and improvement to performing sequence analysis of grafted samples. Both Python and C++ implementations are available and they are integrated into several open and closed source pipelines. Disambiguate is open source and is freely available at https://github.com/AstraZeneca-NGS/disambiguate.

Keywords: NGS; disambiguation; explant; patient derived xenograft; sequencing.

Grants and funding

The author(s) declared that no grants were involved in supporting this work.