G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes

Mira Barshai, Alice Aubert, Yaron Orenstein

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation, and has been associated with genomic instability, genetic diseases, and cancer progression. The experimental data produced by the G4-seq experiment provides unprecedented details on G4 formation in the genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G4 formation in new DNA sequences or whole genomes. Here, we present G4detector, a new method based on a convolutional neural network to predict G4s from DNA sequences. On top of the sequence information, we improved prediction accuracy by the addition of RNA secondary structure information. To train and test G4detector, we compiled novel high-throughput benchmarks over multiple species genomes measured by the G4-seq protocol. We show that G4detector outperforms extant methods for the same task on all benchmark datasets, can detect G4s genome-wide with high accuracy, and is able to extrapolate human-trained measurements to various non-human species. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.

Original languageEnglish
Pages (from-to)1946-1955
Number of pages10
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume19
Issue number4
DOIs
StatePublished - 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2004-2012 IEEE.

Keywords

  • Bioinformatics
  • G-quadruplexes
  • convolutional neural networks
  • deep learning

Fingerprint

Dive into the research topics of 'G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes'. Together they form a unique fingerprint.

Cite this