Analysis of amplicon-based NGS data from neurological disease gene panels: a new method for allele drop-out management

BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):339. doi: 10.1186/s12859-016-1189-0.

Abstract

Background: Amplicon-based targeted resequencing is a commonly adopted solution for next-generation sequencing applications focused on specific genomic regions. The reliability of such approaches rests on the high specificity and deep coverage, although sequencing artifacts attributable to PCR-like amplification can be encountered. Between these artifacts, allele drop-out, which is the preferential amplification of one allele, causes an artificial increase in homozygosity when heterozygous mutations fall on a primer pairing region. Here, a procedure to manage such artifacts, based on a pipeline composed of two steps of alignment and variant calling, is proposed. This methodology has been compared to the Illumina Custom Amplicon workflow, available on Illumina MiSeq, on the analysis of data obtained with four newly designed TruSeq Custom Amplicon gene panels.

Results: Four gene panels, specific for Parkinson disease, for Intracerebral Hemorrhage Diseases (COL4A1 and COL4A2 genes) and for Familial Hemiplegic Migraine (CACNA1A and ATP1A2 genes) were designed. A total of 119 samples were re-sequenced with Illumina MiSeq sequencer and panel characterization in terms of coverage, number of variants found and allele drop-out potential impact has been carried out. Results show that 14 % of identified variants is potentially affected by allele drop-out artifacts and that both the Custom Amplicon workflow and the procedure proposed here could correctly identify them. Furthermore, a more complex configuration in presence of two mutations was simulated in silico. In this configuration, our proposed methodology outperforms Custom Amplicon workflow, being able to correctly identify two mutations in all the studied configurations.

Conclusions: Allele drop-out plays a crucial role in amplicon-based targeted re-sequencing and specific procedures in data analysis of amplicon data should be adopted. Although a consensus has been established in the elimination of primer sequences from aligned data (e.g., via primer sequence trimming or soft clipping), more complex configurations need to be managed in order to increase the retrieved information from available data. Our method shows how to manage one of these complex configurations, when two mutations occur.

Keywords: Allele drop-out; Amplicon-based sequencing; Bioinformatic pipeline; Next-generation sequencing; Primer trimming.

Publication types

  • Evaluation Study

MeSH terms

  • Alleles
  • Cerebral Hemorrhage / genetics*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Migraine with Aura / genetics*
  • Mutation
  • Parkinson Disease / genetics*
  • Sensitivity and Specificity
  • Statistics as Topic