TY - JOUR
T1 - A somatic hypermutation–based machine learning model stratifies individuals with Crohn’s disease and controls
AU - Safra, Modi
AU - Werner, Lael
AU - Peres, Ayelet
AU - Polak, Pazit
AU - Salamon, Naomi
AU - Schvimer, Michael
AU - Weiss, Batia
AU - Barshack, Iris
AU - Shouval, Dror S.
AU - Yaari, Gur
N1 - Publisher Copyright:
© 2023 Safra et al.
PY - 2023/1
Y1 - 2023/1
N2 - Crohn’s disease (CD) is a chronic relapsing–remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.
AB - Crohn’s disease (CD) is a chronic relapsing–remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85147317418&partnerID=8YFLogxK
U2 - 10.1101/gr.276683.122
DO - 10.1101/gr.276683.122
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 36526432
SN - 1088-9051
VL - 33
SP - 71
EP - 79
JO - Genome Research
JF - Genome Research
IS - 1
ER -