Rich Parameterization Improves RNA Structure Prediction

Shay Zakov, Yoav Goldberg, Michael Elhadad, Michal Ziv-Ukelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machine- learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations. Contribution. We study the potential contribution of increasing the amount of information utilized by folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. We argue that with suitable learning techniques, not being tied to features whose weights could be determined experimentally, and having a large enough set of examples, one could define much richer feature representations than was previously explored, while still allowing efficient inference. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. Results. In order to test our assumption, we conducted a set of experiments that asses the prediction quality of the proposed models. These experiments reproduce the settings that were applied in recent thorough work that compared prediction qualities of several state-of-the-art RNA folding prediction algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F1-measure over correctly-predicted base-pairs (i.e. 16% error rate), compared to the previously best reported score of 70% (i.e. 30% error rate). That is, the new model yields an error reduction of about 50%. Availability. Additional supporting material, trained models, and source code are available through our website at http://www.cs.bgu.ac.il/~negevcb/contextfold

Original languageEnglish
Title of host publicationResearch in Computational Molecular Biology - 15th Annual International Conference, RECOMB 2011, Proceedings
EditorsVineet Bafna, S. Cenk Sahinalp
PublisherSpringer Verlag
Pages546-562
Number of pages17
ISBN (Print)9783642200359
DOIs
StatePublished - 2011
Externally publishedYes
Event15th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2011 - Vancouver, BC, Canada
Duration: 28 Mar 201131 Mar 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6577 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2011
Country/TerritoryCanada
CityVancouver, BC
Period28/03/1131/03/11

Bibliographical note

Funding Information:
Acknowledgments. The authors are grateful to Mirela Andronescu for her kind help in providing information and pointing us to relevant data. We thank the anonymous referees for their helpful comments. This research was partially supported by ISF grant 478/10 and by the Frankel Center for Computer Science at Ben Gurion University of the Negev.

Publisher Copyright:
© 2011, Springer-Verlag Berlin Heidelberg.

Fingerprint

Dive into the research topics of 'Rich Parameterization Improves RNA Structure Prediction'. Together they form a unique fingerprint.

Cite this