[Establishment and validation of prediction models for human leukocyte antigen haplotypes and human leukocyte antigen genotypes]

Zhonghua Yi Xue Za Zhi. 2024 Mar 19;104(11):834-842. doi: 10.3760/cma.j.cn112137-20231130-01246.
[Article in Chinese]

Abstract

Objective: To establish prediction models for human leukocyte antigen (HLA) haplotypes and HLA genotypes, and verify the prediction accuracy. Methods: The prediction models were established based on the characteristic of HLA haplotype inheritance and linkage disequilibrium (LD), as well as the invention patents and software copyrights obtained. The models include algorithm and reference databases such as HLA A-C-B-DRB1-DQB1 high-resolution haplotypes database, B-C and DRB1-DQB1 LD database, G group alleles table, and NMDP Code alleles table. The prediction algorithm involves data processing, comparison with reference data, filtering results, probability calculation and ranking, confidence degree estimation, and output of prediction results. The accuracy of the predictions was verified by comparing them with the correct results, and the relationship between prediction accuracy and the probability distribution and confidence degree of the predicted results was analyzed. Results: The HLA haplotypes and genotypes prediction models were established. The prediction algorithm included the prediction of A-C-B-DRB1-DQB1 haplotypes according to HLA-A, B, DRB1, C, DQB1 genotypes, the prediction of C and DQB1 high-resolution results according to A, B and DRB1 high-resolution results, and the prediction of A, B, DRB1, C and DQB1 high resolution results according to the A, B and DRB1 intermediate or low resolution results. Validation results of "Predicting A-C-B-DRB1-DQB1 haplotypes basing on HLA-A, B, DRB1, C, DQB1 genotypes" model: for 787 data, the accuracy was 94.0% (740/787) with 740 correct predictions, 34 incorrect predictions, and 13 instances with no predicted results. For 847 data, the accuracy was 100% (847/847). The 2 411 and 2 594 haplotype combinations predicted from 787 and 847 data were grouped according to confidence degree, the accuracy was 100% (48/48, 114/114) for a confidence degree of 1, 96.2% (303/315) and 97.8% (409/418) for a confidence degree of 2 respectively. Validation results of "Predicting A, B, DRB1 and C, DQB1 high-resolution genotypes basing on HLA-A, B, DRB1 high, intermediate, or low resolution genotypes" model: when predicting C and DQB1 high resolution genotypes basing on A, B, and DRB1 high resolution genotypes, 89.3% (1 459/1 634) of the predictions were correct. The accuracy for the top 2 predicted probability (GPP) ranking was 79.2% (1 156/1 459), and for the top 10, it was 95.0% (1 386/1 459). Furthermore, when GPP≥90% and GPP 50%-90%, the prediction accuracy was 81.3% (209/257) and 72.8% (447/614) respectively. The accuracy of predicting C and DQB1 high resolution genotypes basing on the results of A, B, and DRB1 high resolution genotypes from the China Marrow Donor Program was 87.0% (20/23). The accuracy of predicting A, B, DRB1, C, and DQB1 high resolution genotypes basing on the results of A, B, and DRB1 intermediate or low-resolution genotypes was 70.0% (7/10) and 52.5% (21/40) respectively. When predicting whether the patient is likely to have a HLA 10/10 matched donor, the accuracy of the top 2 GPP combinations with a proportion of ≥50% was 85.7% (6/7). Conclusions: When using A, B, DRB1, C, DQB1 genotypes to predict A-C-B-DRB1-DQB1 haplotype combinations, the results with a confidence degree of 1 and 2 are reliable. When predicting C and DQB1 genotypes according to A, B and DRB1 genotypes, the top 10 results ranked by GPP are reliable, and the top 2 results with GPP≥50% are more reliable.

目的: 建立人类白细胞抗原(HLA)单体型和HLA位点基因型预测模型,并验证预测模型的正确性。 方法: 根据HLA单体型遗传及连锁不平衡规律,在获得发明专利和软件著作权的基础上,建立预测模型算法,主要包括:待预测数据预处理、与参考数据比对、预测结果过滤、概率计算和排序、置信度判断及预测结果输出。建立参考数据库包括HLA A-C-B-DRB1-DQB1高分辨单体型数据库、B-C和DRB1-DQB1连锁不平衡数据库,以及G组、NMDP Code等位基因对照表。选取已知A-C-B-DRB1-DQB1单体型和A、B、DRB1、C、DQB1高分辨基因型的数据进行预测,与已知结果比对,验证预测的正确性,分析正确性与预测结果概率分布、置信度的关系。 结果: 建立了HLA单体型和HLA位点基因型预测模型,根据本研究技术路线建立完整的预测模型算法,包括根据HLA-A、B、DRB1、C、DQB1基因型预测A-C-B-DRB1-DQB1单体型;根据HLA-A、B、DRB1高分辨结果预测C、DQB1高分辨结果;根据HLA-A、B、DRB1中、低分辨结果预测A、B、DRB1和C,DQB1高分辨结果。“根据HLA-A、B、DRB1、C、DQB1基因型预测A-C-B-DRB1-DQB1单体型”模型验证结果:在787份验证数据中,740份预测正确,34份预测错误,13份未给出预测结果,预测正确率为94.0%(740/787);847份数据的预测正确率为100%(847/847)。将787、847份数据预测产出的2 411、2 594组单体型组合按置信度分组,置信度为1时正确率均为100%(48/48、114/114),置信度为2时正确率分别为96.2%(303/315)、97.8%(409/418)。根据HLA-A、B、DRB1高、中、低分辨结果预测A、B、DRB1和C、DQB1高分辨结果模型验证结果,使用以上共计1 634份数据的A、B、DRB1高分辨结果预测C、DQB1高分辨结果,经与已知分型结果比对,预测结果中包含正确结果的比例为89.3%(1 459/1 634),其中,正确结果落在预测概率(GPP)排序前2位的比例为79.2%(1 156/1 459),落在前10位的比例达到95.0%(1 386/1 459)。根据预测组合GPP值进一步分析,GPP≥90%、GPP为50%~90%时,预测正确率分别为81.3%(209/257)、72.8%(447/614)。使用中华骨髓库数据再次验证,根据A、B、DRB1高分辨结果预测C、DQB1高分辨结果的正确率为87.0%(20/23);根据A、B、DRB1中分辨结果预测A、B、DRB1、C、DQB1高分辨的正确率为70.0%(7/10);根据A、B、DRB1低分辨结果预测A、B、DRB1、C、DQB1高分辨的正确率为52.5%(21/40)。预测供患者是否可能HLA 10/10相合时,GPP排序前2位且≥50%的组合预测正确率为85.7%(6/7)。 结论: HLA-A、B、DRB1、C、DQB1基因型到单体型的预测,可参考置信度为1、2的结果;根据A、B、DRB1基因型预测C、DQB1基因型时,可参考GPP排序前10位的结果,优先参考GPP排序前2位且≥50%的结果。.

Publication types

  • English Abstract

MeSH terms

  • Alleles
  • Gene Frequency
  • Genotype
  • HLA-A Antigens / genetics
  • HLA-B Antigens* / genetics
  • HLA-C Antigens* / genetics
  • HLA-DQ beta-Chains / genetics
  • HLA-DRB1 Chains / genetics
  • Haplotypes
  • Histocompatibility Antigens Class I / genetics
  • Humans

Substances

  • HLA-B Antigens
  • HLA-C Antigens
  • HLA-DQ beta-Chains
  • HLA-DRB1 Chains
  • Histocompatibility Antigens Class I
  • HLA-A Antigens