Abstract
Background: In the global effort to discover biomarkers for cancer prognosis, prediction tools have become essential resources. TCR (T cell receptor) repertoires contain important features that differentiate healthy controls from cancer patients or differentiate outcomes for patients being treated with different drugs. Considering, tools that can easily and quickly generate and identify important features out of TCR repertoire data and build accurate classifiers to predict future outcomes are essential. Results: This paper introduces GENTLE (GENerator of T cell receptor repertoire features for machine LEarning): an open-source, user-friendly web-application tool that allows TCR repertoire researchers to discover important features; to create classifier models and evaluate them with metrics; and to quickly generate visualizations for data interpretations. We performed a case study with repertoires of TRegs (regulatory T cells) and TConvs (conventional T cells) from healthy controls versus patients with breast cancer. We showed that diversity features were able to distinguish between the groups. Moreover, the classifiers built with these features could correctly classify samples (‘Healthy’ or ‘Breast Cancer’)from the TRegs repertoire when trained with the TConvs repertoire, and from the TConvs repertoire when trained with the TRegs repertoire. Conclusion: The paper walks through installing and using GENTLE and presents a case study and results to demonstrate the application’s utility. GENTLE is geared towards any researcher working with TCR repertoire data and aims to discover predictive features from these data and build accurate classifiers. GENTLE is available on https://github.com/dhiego22/gentle and https://share.streamlit.io/dhiego22/gentle/main/gentle.py.
Original language | English |
---|---|
Article number | 32 |
Journal | BMC Bioinformatics |
Volume | 24 |
Issue number | 1 |
DOIs | |
State | Published - 30 Jan 2023 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s).
Funding
The authors thank Isa Goldberg, Or Malca, Beatriz Stransky, Thiago Felipe, Emmanuel Barbosa, Leonardo Capitani, students and professors of the Federal University of Rio Grande do Norte and Bar-Ilan University for the encouragement, support, discussions and insights generated along the development of this work. This work is supported in part by funds from the Brazilian Funding agency CAPES—National Coordination of High Education Personnel Formation Programs (Grants Numbers 88887.161820/2017-0, 88887.469283/2019-00 and 88887.600071/2021-0). The APC was funded by the Federal University of Rio Grande do Norte. This research was supported by NPAD/UFRN.
Funders | Funder number |
---|---|
Federal University of Rio Grande do Norte and Bar-Ilan University | |
NPAD | |
National Coordination of High Education Personnel Formation Programs | 88887.600071/2021-0, 88887.469283/2019-00, 88887.161820/2017-0 |
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior | |
Universidade Federal do Rio Grande do Norte |
Keywords
- Feature selection
- Machine learning tools
- T cell receptor repertoire