Background: Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world.
Objectives: To build a representative CNV map for the Han Chinese population.
Methods: We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery.
Results: A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population.
Conclusions: Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population.
Keywords: CGH; Copy number variation; Dialect groups; Next generation sequencing.
© Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.