A high-resolution haplotype-resolved Reference panel constructed from the China Kadoorie Biobank Study

Nucleic Acids Res. 2023 Nov 27;51(21):11770-11782. doi: 10.1093/nar/gkad779.

Abstract

Precision medicine depends on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is still not suitable for gigantic studies due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we used 10 000 samples with medium-depth WGS to construct a reference panel that we named the CKB reference panel. By imputing microarray datasets, it showed that the CKB panel outperformed compared panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of 100 706 microarrays with the CKB panel, and the after-imputed data is the hitherto largest whole genome data of the Chinese population. Furthermore, in the GWAS analysis of real phenotype height, the number of tested SNPs tripled and the number of significant SNPs doubled after imputation. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (https://db.cngb.org/imputation/). We believe that the CKB panel is of great value for imputing microarray or low-coverage genotype data of Chinese population, and potentially mixed populations. The imputation-completed 100 706 microarray data are enormous and precious resources of population genetic studies for complex traits and diseases.

MeSH terms

  • Biological Specimen Banks*
  • China
  • Genome*
  • Genome-Wide Association Study
  • Genotype
  • Haplotypes
  • Humans
  • Polymorphism, Single Nucleotide