MICon Contamination Detection Workflow for Next-Generation Sequencing Laboratories Using Microhaplotype Loci and Supervised Learning

J Mol Diagn. 2023 Aug;25(8):602-610. doi: 10.1016/j.jmoldx.2023.05.001. Epub 2023 May 25.

Abstract

Innovation in sequencing instrumentation is increasing the per-batch data volumes and decreasing the per-base costs. Multiplexed chemistry protocols after the addition of index tags have further contributed to efficient and cost-effective sequencer utilization. With these pooled processing strategies, however, comes an increased risk of sample contamination. Sample contamination poses a risk of missing critical variants in a patient sample or wrongly reporting variants derived from the contaminant, which are particularly relevant issues in oncology specimen testing in which low variant allele frequencies have clinical relevance. Small custom-targeted next-generation sequencing (NGS) panels yield limited variants and pose challenges in delineating true somatic variants versus contamination calls. A number of popular contamination identification tools have the ability to perform well in whole-genome/exome sequencing data; however, in smaller gene panels, there are fewer variant candidates for the tools to perform accurately. To prevent clinical reporting of potentially contaminated samples in small next-generation sequencing panels, we have developed MICon (Microhaplotype Contamination detection), a novel contamination detection model that uses microhaplotype site variant allele frequencies. In a heterogeneous hold-out test cohort of 210 samples, the model displayed state-of-the-art performance with an area under the receiver-operating characteristic curve of 0.995.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Laboratories*
  • Supervised Machine Learning
  • Workflow