Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.
Bibliographical notePublisher Copyright:
© The Author(s) 2023.
- deep learning
- deep neural network
- transfer learning