Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks

Mira Barshai, Yaron Orenstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. G-quadruplex formation in a DNA template can be assessed using polymerase stop assays, which measure polymerase stalling at G-quadruplex sites. An experimental technique, called G4-seq, was developed by combining features of the polymerase stop assay with Illumina next-generation sequencing. The experimental data produced by this technique provides unprecedented details on where and at what intensity do G-quadruplexes form in the human genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G-quadruplex formation of new DNA sequences or whole genomes. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. To test G4detector, we compiled novel high-throughput in vitro and in vivo benchmarks. On these data, we show that G4detector outperforms extant methods for the same task on all benchmark datasets. We visualize the most important features of G4detector models and discover that G-quadruplex formation is highly depended on G-tracts length, their spacing and nucleotide composition between them. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.

Original languageEnglish
Title of host publicationACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages357-365
Number of pages9
ISBN (Electronic)9781450366663
DOIs
StatePublished - 4 Sep 2019
Externally publishedYes
Event10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019 - Niagara Falls, United States
Duration: 7 Sep 201910 Sep 2019

Publication series

NameACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Conference

Conference10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019
Country/TerritoryUnited States
CityNiagara Falls
Period7/09/1910/09/19

Bibliographical note

Publisher Copyright:
© 2019 Copyright held by the owner/author(s).

Keywords

  • Convolutional neural networks
  • G-quadruplex

Fingerprint

Dive into the research topics of 'Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks'. Together they form a unique fingerprint.

Cite this