A Language Model identifies population-level features of the T cell Receptor via self-supervised learning

Romi Goldner Kabeli, Sol Efroni

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

T cells are at the core of human health. Their unique ability to produce an overwhelming repertoire of receptors which they use to interact with threats and to assist with healthy functions has made them an interesting subject for Data Science. One such function, involved with multiple conditions including cancer, response to viral threats, autoimmune disease and more, is the ability to produce what are termed Public clones. Those Public Clones are T cells that are shared between individuals. Some of those clones are even shared between a high percentage of all observed samples. Yet, the reason for this sharing, as well as the DNA sequence that might characterize them is still unknown. Here, using a BERT-based language model, we show that a latent space built by self supervised learning provides distinct areas for Public and Private sequences. We continue to show that these embeddings could be successfully used for binary classification to tell apart Public and Private sequences.

Original languageEnglish
Title of host publicationProceedings - 2022 4th International Conference on Transdisciplinary AI, TransAI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages75-78
Number of pages4
ISBN (Electronic)9781665471848
DOIs
StatePublished - 2022
Event4th International Conference on Transdisciplinary AI, TransAI 2022 - Laguna Hills, United States
Duration: 20 Sep 202222 Sep 2022

Publication series

NameProceedings - 2022 4th International Conference on Transdisciplinary AI, TransAI 2022

Conference

Conference4th International Conference on Transdisciplinary AI, TransAI 2022
Country/TerritoryUnited States
CityLaguna Hills
Period20/09/2222/09/22

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This work has been supported by the following funding mechanisms: ISF grant number 582/19, ISF-NSFC grant 3382/20, ICRF grant 829965, and BSF grant 2019090.

Publisher Copyright:
© 2022 IEEE.

Keywords

  • DNA sequencing
  • Language models
  • T cells
  • immunological repertoire

Fingerprint

Dive into the research topics of 'A Language Model identifies population-level features of the T cell Receptor via self-supervised learning'. Together they form a unique fingerprint.

Cite this