Abstract
Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.
Original language | English |
---|---|
Article number | bbad252 |
Journal | Briefings in Bioinformatics |
Volume | 24 |
Issue number | 4 |
DOIs | |
State | Published - 20 Jul 2023 |
Bibliographical note
Publisher Copyright:© The Author(s) 2023.
Funding
This work was supported in part by funds from the Israel Science Foundation (grant no. 358/21).
Funders | Funder number |
---|---|
Israel Science Foundation | 358/21 |
Keywords
- G-quadruplex
- deep learning
- deep neural network
- interpretability
- transfer learning