Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles

Daniel Gadala-Maria, Gur Yaari, Mohamed Uduman, Steven H. Kleinstein

Research output: Contribution to journalArticlepeer-review

147 Scopus citations

Abstract

Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.

Original languageEnglish
Pages (from-to)E862-E870
JournalProceedings of the National Academy of Sciences of the United States of America
Volume112
Issue number8
DOIs
StatePublished - 24 Feb 2015

Funding

FundersFunder number
National Institutes of HealthT15LM07056, R01AI104739
U.S. National Library of MedicineT15LM007056

    Keywords

    • Adaptive immunity
    • B-cell repertoire
    • Next-generation sequencing
    • Somatic hypermutation
    • Variable gene segment

    Fingerprint

    Dive into the research topics of 'Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles'. Together they form a unique fingerprint.

    Cite this