Abstract
G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. G-quadruplex formation in a DNA template can be assessed using polymerase stop assays, which measure polymerase stalling at G-quadruplex sites. An experimental technique, called G4-seq, was developed by combining features of the polymerase stop assay with Illumina next-generation sequencing. The experimental data produced by this technique provides unprecedented details on where and at what intensity do G-quadruplexes form in the human genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G-quadruplex formation of new DNA sequences or whole genomes. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. To test G4detector, we compiled novel high-throughput in vitro and in vivo benchmarks. On these data, we show that G4detector outperforms extant methods for the same task on all benchmark datasets. We visualize the most important features of G4detector models and discover that G-quadruplex formation is highly depended on G-tracts length, their spacing and nucleotide composition between them. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.
Original language | English |
---|---|
Title of host publication | ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics |
Publisher | Association for Computing Machinery, Inc |
Pages | 357-365 |
Number of pages | 9 |
ISBN (Electronic) | 9781450366663 |
DOIs | |
State | Published - 4 Sep 2019 |
Externally published | Yes |
Event | 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019 - Niagara Falls, United States Duration: 7 Sep 2019 → 10 Sep 2019 |
Publication series
Name | ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics |
---|
Conference
Conference | 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019 |
---|---|
Country/Territory | United States |
City | Niagara Falls |
Period | 7/09/19 → 10/09/19 |
Bibliographical note
Publisher Copyright:© 2019 Copyright held by the owner/author(s).
Keywords
- Convolutional neural networks
- G-quadruplex