TY - JOUR
T1 - An overview on nucleic-acid G-quadruplex prediction
T2 - from rule-based methods to deep neural networks
AU - Elimelech-Zohar, Karin
AU - Orenstein, Yaron
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.
AB - Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.
KW - G-quadruplex
KW - deep learning
KW - deep neural network
KW - interpretability
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85165521508&partnerID=8YFLogxK
U2 - 10.1093/bib/bbad252
DO - 10.1093/bib/bbad252
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
C2 - 37438149
AN - SCOPUS:85165521508
SN - 1467-5463
VL - 24
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 4
M1 - bbad252
ER -