Abstract
T cells are at the core of human health. Their unique ability to produce an overwhelming repertoire of receptors which they use to interact with threats and to assist with healthy functions has made them an interesting subject for Data Science. One such function, involved with multiple conditions including cancer, response to viral threats, autoimmune disease and more, is the ability to produce what are termed Public clones. Those Public Clones are T cells that are shared between individuals. Some of those clones are even shared between a high percentage of all observed samples. Yet, the reason for this sharing, as well as the DNA sequence that might characterize them is still unknown. Here, using a BERT-based language model, we show that a latent space built by self supervised learning provides distinct areas for Public and Private sequences. We continue to show that these embeddings could be successfully used for binary classification to tell apart Public and Private sequences.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 4th International Conference on Transdisciplinary AI, TransAI 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 75-78 |
Number of pages | 4 |
ISBN (Electronic) | 9781665471848 |
DOIs | |
State | Published - 2022 |
Event | 4th International Conference on Transdisciplinary AI, TransAI 2022 - Laguna Hills, United States Duration: 20 Sep 2022 → 22 Sep 2022 |
Publication series
Name | Proceedings - 2022 4th International Conference on Transdisciplinary AI, TransAI 2022 |
---|
Conference
Conference | 4th International Conference on Transdisciplinary AI, TransAI 2022 |
---|---|
Country/Territory | United States |
City | Laguna Hills |
Period | 20/09/22 → 22/09/22 |
Bibliographical note
Funding Information:ACKNOWLEDGMENT This work has been supported by the following funding mechanisms: ISF grant number 582/19, ISF-NSFC grant 3382/20, ICRF grant 829965, and BSF grant 2019090.
Publisher Copyright:
© 2022 IEEE.
Keywords
- DNA sequencing
- Language models
- T cells
- immunological repertoire