Sequence-matching adapter trimmers generate consistent quality and assembly metrics for Illumina sequencing of RNA viruses

BMC Res Notes. 2024 Oct 14;17(1):308. doi: 10.1186/s13104-024-06951-0.

Abstract

Trimming adapters and low-quality bases from next-generation sequencing (NGS) data is crucial for optimal analysis. We evaluated six trimming programs, implementing five different algorithms, for their effectiveness in trimming adapters and improving quality, contig assembly, and single-nucleotide polymorphism (SNP) quality and concordance for poliovirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and norovirus paired data sequenced on Illumina iSeq and MiSeq platforms. Trimmomatic and BBDuk effectively removed adapters from all datasets, unlike FastP, AdapterRemoval, SeqPurge, and Skewer. All trimmers improved read quality (Q ≥ 30, 87.8 - 96.1%) compared to raw reads (83.6 - 93.2%). Trimmers implementing traditional sequence-matching (Trimmomatic and AdapterRemoval) and overlapping algorithm (FastP) retained the highest-quality reads. While all trimmers improved the maximum contig length and genome coverage for iSeq and MiSeq viral assemblies, BBDuk-trimmed reads assembled the shortest contigs. SNP concordance was consistently high (> 97.7 - 100%) across trimmers. However, BBDuk-trimmed reads had the lowest quality SNPs. Overall, the two adapter trimmers that utilized the traditional sequence-matching algorithm performed consistently across the viral datasets analyzed. Our findings guide software selection and inform future versatile trimmer development for viral genome analysis.

Keywords: Adapter trimming; De novo assembly; Illumina; Next-generation sequencing; Quality control; RNA viruses.

MeSH terms

  • Algorithms*
  • Genome, Viral / genetics
  • High-Throughput Nucleotide Sequencing* / methods
  • High-Throughput Nucleotide Sequencing* / standards
  • Humans
  • Norovirus / genetics
  • Poliovirus* / genetics
  • Polymorphism, Single Nucleotide* / genetics
  • RNA Viruses / genetics
  • RNA, Viral / genetics
  • SARS-CoV-2* / genetics
  • Software

Substances

  • RNA, Viral