AIRR-C IG Reference Sets: curated sets of immunoglobulin heavy and light chain germline genes

Andrew M. Collins, Mats Ohlin, Martin Corcoran, James M. Heather, Duncan Ralph, Mansun Law, Jesus Martínez-Barnetche, Jian Ye, Eve Richardson, William S. Gibson, Oscar L. Rodriguez, Ayelet Peres, Gur Yaari, Corey T. Watson, William D. Lees

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Introduction: Analysis of an individual’s immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3’ or 5’ truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website ( and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.

Original languageEnglish
Article number1330153
JournalFrontiers in Immunology
StatePublished - 2023

Bibliographical note

Publisher Copyright:
Copyright © 2024 Collins, Ohlin, Corcoran, Heather, Ralph, Law, Martínez-Barnetche, Ye, Richardson, Gibson, Rodriguez, Peres, Yaari, Watson and Lees.


The author(s) declare financial support was received for the research, authorship, and/or publication of this article. JY was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health. MO was supported in part by the Swedish Research Council (grant number 2019-01042). ER was supported by NIH contract 75N93019C00001 (NIAID) and grant U24CA248138 (NCI). CTW, WSG, and OLR were funded in part by relevant grants from the National Institute of Allergy and Infectious Diseases (R21AI142590 and R24AI138963). WL was supported in part by the European Union’s Horizon 2020 research and innovation program (grant number 825821). Acknowledgments

FundersFunder number
National Institutes of Health75N93019C00001
National Institute of Allergy and Infectious DiseasesR21AI142590, R24AI138963, U24CA248138
U.S. National Library of Medicine
National Computational Infrastructure
Horizon 2020 Framework Programme825821


    • IGHD
    • IGHJ
    • IGHV genes
    • heavy chain
    • immunoglobulin
    • light chain


    Dive into the research topics of 'AIRR-C IG Reference Sets: curated sets of immunoglobulin heavy and light chain germline genes'. Together they form a unique fingerprint.

    Cite this