A novel human protein-coding locus identified using a targeted RNA enrichment technique

BMC Biol. 2024 Nov 26;22(1):273. doi: 10.1186/s12915-024-02069-8.

Abstract

Background: Accurate and comprehensive genomic annotation, including the full list of protein-coding genes, is vital for understanding the molecular mechanisms of human biology. We have previously shown that the genome contains a multitude of yet hidden functional exons and transcripts, some of which might represent novel mRNAs. These results resonate with those from other groups and strongly argue that two decades after the completion of the first draft of the human genome sequence, the current annotation of human genes and transcripts remains far from being complete.

Results: Using a targeted RNA enrichment technique, we showed that one of the novel functional exons previously discovered by us and currently annotated as part of a long non-coding RNA, is actually a part of a novel protein-coding gene, InSETG-4, which encodes a novel human protein with no known homologs or motifs. We found that InSETG-4 is induced by various DNA-damaging agents across multiple cell types and therefore might represent a novel component of DNA damage response. Despite its low abundance in bulk cell populations, InSETG-4 exhibited expression restricted to a small fraction of cells, as demonstrated by the amplification-based single-molecule fluorescence in situ hybridization (asmFISH) analysis.

Conclusions: This study argues that yet undiscovered human protein-coding genes exist and provides an example of how targeted RNA enrichment techniques can help to fill this major gap in our knowledge of the information encoded in the human genome.

Keywords: DNA damage response; Genomic “dark matter”; Mass spectrometry; Nanopore sequencing; Novel gene; Novel protein; Rapid amplification of cDNA ends; Single-cell analysis; Single-molecule fluorescence in situ hybridization; Targeted RNA enrichment.

MeSH terms

  • Exons / genetics
  • Genome, Human
  • Humans
  • In Situ Hybridization, Fluorescence* / methods
  • Open Reading Frames / genetics
  • RNA / genetics
  • RNA, Long Noncoding / genetics

Substances

  • RNA
  • RNA, Long Noncoding