gcaPDA: a haplotype-resolved diploid assembler

BMC Bioinformatics. 2022 Feb 14;23(1):68. doi: 10.1186/s12859-022-04591-4.

Abstract

Background: Generating chromosome-scale haplotype resolved assembly is important for functional studies. However, current de novo assemblers are either haploid assemblers that discard allelic information, or diploid assemblers that can only tackle genomes of low complexity.

Results: Here, Using robust programs, we build a diploid genome assembly pipeline called gcaPDA (gamete cells assisted Phased Diploid Assembler), which exploits haploid gamete cells to assist in resolving haplotypes. We demonstrate the effectiveness of gcaPDA based on simulated HiFi reads of maize genome which is highly heterozygous and repetitive, and real data from rice.

Conclusions: With applicability of coping with complex genomes and fewer restrictions on application than most of diploid assemblers, gcaPDA is likely to find broad applications in studies of eukaryotic genomes.

Keywords: Diploid; Gamete cells; Haplotype-resolved de novo assembler; Highly heterozygous genomes.

MeSH terms

  • Alleles
  • Chromosomes*
  • Diploidy*
  • Haploidy
  • Haplotypes
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA