HLA haplotype frequency estimation for heterogeneous populations using a graph-based imputation algorithm

Sapir Israeli, Loren Gragert, Martin Maiers, Yoram Louzoun

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


HLA haplotype frequencies are estimated from ambiguous unphased HLA genotyping data using Expectation-Maximization (EM) algorithms. Current population genetics methods require independent EM frequency estimates for each population, and assume that each population is in Hardy-Weinberg Equilibrium (HWE). The HWE assumption of EM has thus far resulted in the exclusion of individuals from mixed or unknown ethnic backgrounds from reference datasets. Multi-region populations are currently poorly served by stem cell donor registry HLA imputation and matching implementations due to the inability of such algorithms to incorporate admixture into their population genetics models. To address this unmet need, we have expanded the imputation component of our GRaph IMputation and Matching (GRIMM) framework, where imputation becomes the expectation step in an iterative EM algorithm. Our novel multi-region EM implementation considers region as a Bayesian prior, enabling integration of HLA information from multiple single-region population groups, and for the first time including individuals with ambiguous or mixed ethnic backgrounds. We show that our multi-region EM produces much higher likelihood values and better haplotype recovery as measured by Kullback-Leibler divergence than all evaluated EM implementations when tested on real datasets of US donor registry HLA typings as well as simulated multi-region datasets of ambiguous HLA typings.

Original languageEnglish
Pages (from-to)746-757
Number of pages12
JournalHuman Immunology
Issue number10
StatePublished - Oct 2021

Bibliographical note

Publisher Copyright:
© 2021 American Society for Histocompatibility and Immunogenetics


  • HLA
  • Haplotype frequencies
  • Multi-region expectation-maximization algorithm


Dive into the research topics of 'HLA haplotype frequency estimation for heterogeneous populations using a graph-based imputation algorithm'. Together they form a unique fingerprint.

Cite this