TY - JOUR
T1 - An unbiased comparison of immunoglobulin sequence aligners
AU - Konstantinovsky, Thomas
AU - Peres, Ayelet
AU - Polak, Pazit
AU - Yaari, Gur
N1 - Publisher Copyright:
© 2024 The Author(s).
PY - 2024/9/23
Y1 - 2024/9/23
N2 - Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is critical for our understanding of the adaptive immune system's dynamics in health and disease. Reliable analysis of AIRR-seq data depends on accurate rearranged immunoglobulin (Ig) sequence alignment. Various Ig sequence aligners exist, but there is no unified benchmarking standard representing the complexities of AIRR-seq data, obscuring objective comparisons of aligners across tasks. Here, we introduce GenAIRR, a modular simulation framework for generating Ig sequences alongside their ground truths. GenAIRR realistically simulates the intricacies of V(D)J recombination, somatic hypermutation, and an array of sequence corruptions. We comprehensively assessed prominent Ig sequence aligners across various metrics, unveiling unique performance characteristics for each aligner. The GenAIRR-produced datasets, combined with the proposed rigorous evaluation criteria, establish a solid basis for unbiased benchmarking of immunogenetics computational tools. It sets up the ground for further improving the crucial task of Ig sequence alignment, ultimately enhancing our understanding of adaptive immunity.
AB - Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is critical for our understanding of the adaptive immune system's dynamics in health and disease. Reliable analysis of AIRR-seq data depends on accurate rearranged immunoglobulin (Ig) sequence alignment. Various Ig sequence aligners exist, but there is no unified benchmarking standard representing the complexities of AIRR-seq data, obscuring objective comparisons of aligners across tasks. Here, we introduce GenAIRR, a modular simulation framework for generating Ig sequences alongside their ground truths. GenAIRR realistically simulates the intricacies of V(D)J recombination, somatic hypermutation, and an array of sequence corruptions. We comprehensively assessed prominent Ig sequence aligners across various metrics, unveiling unique performance characteristics for each aligner. The GenAIRR-produced datasets, combined with the proposed rigorous evaluation criteria, establish a solid basis for unbiased benchmarking of immunogenetics computational tools. It sets up the ground for further improving the crucial task of Ig sequence alignment, ultimately enhancing our understanding of adaptive immunity.
KW - AIRR-seq
KW - V(D)J recombination
KW - benchmarking
KW - immunoglobulin
KW - sequence alignment
KW - somatic hypermutation
UR - https://www.scopus.com/pages/publications/85208460224
U2 - 10.1093/bib/bbae556
DO - 10.1093/bib/bbae556
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 39489605
AN - SCOPUS:85208460224
SN - 1467-5463
VL - 25
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 6
M1 - bbae556
ER -