The human leukocyte antigen (HLA) is the most polymorphic region in humans. Anthropologists use HLA to trace populations’ migration and evolution. However, recent admixture between populations can mask the ancestral haplotype frequency distribution. We present a statistical method based on high-resolution HLA haplotype frequencies to resolve population admixture using a non-negative matrix factorization formalism and validated using haplotype frequencies from 56 world populations. The result is a minimal set of source components (SCs) decoding roughly 90% of the total variance in the studied admixtures. These SCs agree with the geographical distribution, phylogenies, and recent admixture events of the studied groups. With the growing population of multi-ethnic individuals, or individuals that do not report race/ethnic information, the HLA matching process for stem-cell and solid organ transplants is becoming more challenging. The presented algorithm provides a framework that facilitates the breakdown of highly admixed populations into SCs, which can be used to better match the rapidly growing population of multi-ethnic individuals worldwide.
Bibliographical noteFunding Information:
We would like to thank the following registries for allowing their data to be used for this study: National Marrow Donor Program/Be The Match, USA; Ezer Mizion Bone Marrow Donor Registry, Israel; OneMatch Stem Cell and Marrow Network, Canada; Australian Bone Marrow Donor Registry, Australia; Matchis: the Dutch Centre for Stem Cell Donors, The Netherlands; Norwegian Bone Marrow Donor Registry, Norway; New Zealand Bone Marrow Donor Registry, New Zealand; Tobias Registry of Swedish Bone Marrow Donors, Sweden; Thai National Stem Cell Donor Registry, Thailand; Welsh Bone Marrow Donor Registry, Wales, UK.
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
- Genetic admixture
- Non-negative matrix factorization
- Stem-cell donor registry
- Unsupervised learning