A pangenome reference of 36 Chinese populations

Yang Gao; Xiaofei Yang; Hao Chen; Xinjiang Tan; Zhaoqing Yang; Lian Deng; Baonan Wang; Shuang Kong; Songyang Li; Yuhang Cui; Chang Lei; Yimin Wang; Yuwen Pan; Sen Ma; Hao Sun; Xiaohan Zhao; Yingbing Shi; Ziyi Yang; Dongdong Wu; Shaoyuan Wu; Xingming Zhao; Binyin Shi; Li Jin; Zhibin Hu; Chinese Pangenome Consortium (CPC); Yan Lu; Jiayou Chu; Kai Ye; Shuhua Xu

doi:10.1038/s41586-023-06173-7

A pangenome reference of 36 Chinese populations

Nature. 2023 Jul;619(7968):112-121. doi: 10.1038/s41586-023-06173-7. Epub 2023 Jun 14.

Authors

Yang Gao^#^{1

2

3

4}, Xiaofei Yang^#^{5

6

7}, Hao Chen^#³, Xinjiang Tan^#³, Zhaoqing Yang^#⁸, Lian Deng^#¹, Baonan Wang², Shuang Kong², Songyang Li², Yuhang Cui², Chang Lei¹, Yimin Wang³, Yuwen Pan³, Sen Ma³, Hao Sun⁸, Xiaohan Zhao², Yingbing Shi¹, Ziyi Yang¹, Dongdong Wu⁹, Shaoyuan Wu¹⁰, Xingming Zhao¹¹, Binyin Shi¹², Li Jin^{1

2}, Zhibin Hu^{13

14}; Chinese Pangenome Consortium (CPC); Yan Lu¹⁵, Jiayou Chu¹⁶, Kai Ye^{17

18

19}, Shuhua Xu^{20

21

22

23

24

25}

Collaborators

Chinese Pangenome Consortium (CPC):
Chuangxue Mao, Shaohua Fan, Qiang Gao, Juncheng Dai, Fengxiao Bu, Guanglin He, Yang Wu, Huijun Yuan, Jinchen Li, Chao Chen, Jian Yang, Chaochun Wei, Xin Jin, Xia Shen

Affiliations

¹ State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China.
² Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
³ Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
⁴ School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
⁵ School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
⁶ MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
⁷ Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
⁸ Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China.
⁹ State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
¹⁰ Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China.
¹¹ Institute of Science and Technology for Brain-Inspired Intelligence, Ministry of Education Key (MOE) Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science Fudan University, Shanghai, China.
¹² Department of Endocrinology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
¹³ State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China.
¹⁴ Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China.
¹⁵ State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China. lueyan@fudan.edu.cn.
¹⁶ Department of Medical Genetics, Institute of Medical Biology, Chinese Academy of Medical Sciences, Kunming, China. chujy@imbcams.com.cn.
¹⁷ MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China. kaiye@xjtu.edu.cn.
¹⁸ School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China. kaiye@xjtu.edu.cn.
¹⁹ School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China. kaiye@xjtu.edu.cn.
²⁰ State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China. xushua@fudan.edu.cn.
²¹ Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China. xushua@fudan.edu.cn.
²² School of Life Science and Technology, ShanghaiTech University, Shanghai, China. xushua@fudan.edu.cn.
²³ Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, China. xushua@fudan.edu.cn.
²⁴ Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China. xushua@fudan.edu.cn.
²⁵ Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China. xushua@fudan.edu.cn.

^# Contributed equally.

Abstract

Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference¹. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.

MeSH terms

Alleles
DNA Repair / genetics
East Asian People* / classification
East Asian People* / genetics
Ethnic and Racial Minorities
Ethnicity* / genetics
Euchromatin / genetics
Genetic Variation*
Genome, Human* / genetics
Haplotypes / genetics
Human Genetics* / standards
Humans
Immunity / genetics
Keratins / genetics
Keratins / metabolism
Longevity / genetics
Minority Groups*
Reference Standards
Sequence Analysis, DNA
Ultraviolet Rays

Substances

Euchromatin
Keratins