Identification of subject-specific immunoglobulin alleles from expressed repertoire sequencing data

Daniel Gadala-Maria, Moriah Gidoni, Susanna Marquez, Jason A. Vander Heiden, Justin T. Kos, Corey T. Watson, Kevin C. O'Connor, Gur Yaari, Steven H. Kleinstein

Research output: Contribution to journalArticlepeer-review

42 Scopus citations

Abstract

The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D), and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorithm which can detect alleles that differ by any number of SNPs from the nearest database allele, and can construct subject-specific genotypes with minimal prior information. TIgGER predictions are validated both computationally (using a leave-one-out strategy) and experimentally (using genomic sequencing), resulting in the addition of three new immunoglobulin heavy chain V (IGHV) gene alleles to the IMGT repertoire. Finally, we develop a Bayesian strategy to provide a confidence estimate associated with genotype calls. All together, these methods allow for much higher accuracy in germline allele assignment, an essential step in AIRR-seq studies.

Original languageEnglish
Article number129
JournalFrontiers in Immunology
Volume10
Issue numberFEB
DOIs
StatePublished - 2019

Bibliographical note

Publisher Copyright:
© 2019 Gadala-Maria, Gidoni, Marquez.

Funding

This work was supported by the United States–Israel Binational Science Foundation (grant number 2017253) to GY, SK, and MG, and grants from the National Institutes of Health (R01AI104739 to SK), the National Institute of Allergy and Infectious Diseases of the National Institutes of Health through award number R01AI114780 to KO, and grants R24AI138963 and R21AI142590 to CW.

FundersFunder number
National Institutes of Health
National Institute of Allergy and Infectious DiseasesR21AI142590, R24AI138963, R01AI114780, R01AI104739
United States-Israel Binational Science Foundation2017253

    Keywords

    • AIRR-seq
    • Allele
    • Antibodies
    • BCR Vander Heiden
    • Kos
    • Somatic hypermutation

    Fingerprint

    Dive into the research topics of 'Identification of subject-specific immunoglobulin alleles from expressed repertoire sequencing data'. Together they form a unique fingerprint.

    Cite this