HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data

Dhawal Jain; Chong Chu; Burak Han Alver; Soohyun Lee; Eunjung Alice Lee; Peter J Park

doi:10.1093/bioinformatics/btaa923

HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data

Bioinformatics. 2021 May 23;37(8):1045-1051. doi: 10.1093/bioinformatics/btaa923.

Authors

Dhawal Jain¹, Chong Chu¹, Burak Han Alver¹, Soohyun Lee¹, Eunjung Alice Lee^{2

3}, Peter J Park¹

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
² Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA 02115, USA.
³ Broad Institute of MIT and Harvard University, Cambridge, MA 02142, USA.

Abstract

Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples.

Availability and implementation: HiTea is available at https://github.com/parklab/HiTea and as a Docker image.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Chromatin*
Chromosomes
DNA Transposable Elements* / genetics
Humans
Molecular Conformation
Whole Genome Sequencing

Substances

Chromatin
DNA Transposable Elements

Abstract

Publication types

MeSH terms

Substances

Grants and funding