The complete sequence of a human Y chromosome

Nature. 2023 Sep;621(7978):344-354. doi: 10.1038/s41586-023-06457-y. Epub 2023 Aug 23.

Abstract

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

MeSH terms

  • Base Sequence
  • Chromosomes, Human, Y* / genetics
  • DNA, Satellite / genetics
  • Genetic Variation / genetics
  • Genetics, Population
  • Genomics* / methods
  • Genomics* / standards
  • Heterochromatin / genetics
  • Humans
  • Multigene Family / genetics
  • Reference Standards
  • Segmental Duplications, Genomic / genetics
  • Sequence Analysis, DNA* / standards
  • Tandem Repeat Sequences / genetics
  • Telomere / genetics

Substances

  • DNA, Satellite
  • Heterochromatin
  • TSPY1 protein, human
  • RBMY1A1 protein, human
  • DAZ1 protein, human

Grants and funding