Reference flow: reducing reference bias using multiple population genomes

Nae-Chyun Chen; Brad Solomon; Taher Mun; Sheila Iyer; Ben Langmead

doi:10.1186/s13059-020-02229-3

Reference flow: reducing reference bias using multiple population genomes

Genome Biol. 2021 Jan 4;22(1):8. doi: 10.1186/s13059-020-02229-3.

Authors

Nae-Chyun Chen¹, Brad Solomon¹, Taher Mun¹, Sheila Iyer¹, Ben Langmead²

Affiliations

¹ Department of Computer Science, Johns Hopkins University, Baltimore, USA.
² Department of Computer Science, Johns Hopkins University, Baltimore, USA. langmea@cs.jhu.edu.

Abstract

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Chromosomes, Human, Pair 21
Genome, Human*
Humans
Metagenomics*
Sequence Alignment
Sequence Analysis, DNA
Whole Genome Sequencing

Abstract

Publication types

MeSH terms

Grants and funding