Assessing genome-wide copy number variation in the Han Chinese population

Jianqi Lu; Haiyi Lou; Ruiqing Fu; Dongsheng Lu; Feng Zhang; Zhendong Wu; Xi Zhang; Changhua Li; Baijun Fang; Fangfang Pu; Jingning Wei; Qian Wei; Chao Zhang; Xiaoji Wang; Yan Lu; Shi Yan; Yajun Yang; Li Jin; Shuhua Xu

doi:10.1136/jmedgenet-2017-104613

Assessing genome-wide copy number variation in the Han Chinese population

J Med Genet. 2017 Oct;54(10):685-692. doi: 10.1136/jmedgenet-2017-104613. Epub 2017 Jul 13.

Authors

Jianqi Lu¹, Haiyi Lou², Ruiqing Fu^{2

3}, Dongsheng Lu^{2

3}, Feng Zhang^{1

4}, Zhendong Wu^{2

5}, Xi Zhang^{2

5}, Changhua Li¹, Baijun Fang¹, Fangfang Pu¹, Jingning Wei¹, Qian Wei¹, Chao Zhang^{2

3}, Xiaoji Wang^{2

3}, Yan Lu², Shi Yan¹, Yajun Yang¹, Li Jin^{1

4}, Shuhua Xu^{2

3

4

5}

Affiliations

¹ State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China.
² Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai, China.
³ University of Chinese Academy of Sciences, Beijing, China.
⁴ Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai, China.
⁵ School of Life Science and Technology, ShanghaiTech University, Shanghai, China.

PMID: 28705883
DOI: 10.1136/jmedgenet-2017-104613

Abstract

Background: Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world.

Objectives: To build a representative CNV map for the Han Chinese population.

Methods: We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery.

Results: A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population.

Conclusions: Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population.

Keywords: CGH; Copy number variation; Dialect groups; Next generation sequencing.

MeSH terms

Asian People / genetics*
China
DNA Copy Number Variations*
Ethnicity / genetics*
Genetic Variation*
Genome, Human*
Humans
Male