Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p

Genome Res. 2003 Feb;13(2):159-72. doi: 10.1101/gr.644503.

Abstract

Contiguous finished sequence from highly duplicated pericentromeric regions of human chromosomes is needed if we are to understand the role of pericentromeric instability in disease, and in gene and karyotype evolution. Here, we have constructed a BAC contig spanning the transition from pericentromeric satellites to genes on the short arm of human chromosome 10, and used this to generate 1.4 Mb of finished genomic sequence. Combining RT-PCR, in silico gene prediction, and paralogy analysis, we can identify two domains within the sequence. The proximal 600 kb consists of satellite-rich pericentromerically duplicated DNA which is transcript poor, containing only three unspliced transcripts. In contrast, the distal 850 kb contains four known genes (ZNF248, ZNF25, ZNF33A, and ZNF37A) and up to 32 additional transcripts of unknown function. This distal region also contains seven out of the eight intrachromosomal duplications within the sequence, including the p arm copy of the approximately 250-kb duplication which gave rise to ZNF33A and ZNF33B. By sequencing orthologs of the duplicated ZNF33 genes we have established that ZNF33A has diverged significantly at residues critical for DNA binding but ZNF33B has not, indicating that ZNF33B has remained constrained by selection for ancestral gene function. These results provide further evidence of gene formation within intrachromosomal duplications, but indicate that recent interchromosomal duplications at this centromere have involved transcriptionally inert, satellite rich DNA, which is likely to be heterochromatic. This suggests that any novel gene structures formed by these interchromosomal events would require relocation to a more open chromatin environment to be expressed.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics
  • Animals
  • Callithrix / genetics
  • Centromere / genetics*
  • Chromosomes, Human, Pair 10 / genetics*
  • Chromosomes, Human, Pair 7 / genetics
  • Contig Mapping / methods
  • DNA, Satellite / genetics*
  • Dolphins
  • Evolution, Molecular
  • Gene Duplication
  • Gene Expression Profiling / methods*
  • Genes / genetics*
  • Humans
  • Lorisidae
  • Molecular Sequence Data
  • Pseudogenes / genetics
  • Repressor Proteins / genetics
  • Species Specificity
  • Swine
  • Telomere / genetics
  • Zinc Fingers / genetics

Substances

  • CIC protein, human
  • DNA, Satellite
  • Repressor Proteins

Associated data

  • GENBANK/AJ245587
  • GENBANK/AJ245588
  • GENBANK/AJ250940
  • GENBANK/AJ250941
  • GENBANK/AJ250942
  • GENBANK/AJ250943
  • GENBANK/AJ250944
  • GENBANK/AJ250945
  • GENBANK/AJ250946
  • GENBANK/AJ250947
  • GENBANK/AJ250948
  • GENBANK/AJ250949
  • GENBANK/AJ250950
  • GENBANK/AJ251655
  • GENBANK/AJ275023
  • GENBANK/AJ275024
  • GENBANK/AJ275025
  • GENBANK/AJ275026
  • GENBANK/AJ275027
  • GENBANK/AJ275028
  • GENBANK/AJ275029
  • GENBANK/AJ275030
  • GENBANK/AJ275031
  • GENBANK/AJ275032
  • GENBANK/AJ275033
  • GENBANK/AJ275034
  • GENBANK/AJ275035
  • GENBANK/AJ275036
  • GENBANK/AJ491691
  • GENBANK/AJ491692
  • GENBANK/AJ491693
  • GENBANK/AJ491694
  • GENBANK/AJ491695
  • GENBANK/AJ491696
  • GENBANK/AJ491697
  • GENBANK/AJ492195
  • GENBANK/AJ492196
  • GENBANK/AL117337
  • GENBANK/AL117339
  • GENBANK/AL121927
  • GENBANK/AL132657
  • GENBANK/AL132658
  • GENBANK/AL132659
  • GENBANK/AL133216
  • GENBANK/AL133217
  • GENBANK/AL133350
  • GENBANK/AL135791
  • GENBANK/AL161931
  • GENBANK/AL391686