Correlation approach to identify coding regions in DNA sequences

S. M. Ossadnik, S. V. Buldyrev, A. L. Goldberger, S. Havlin, R. N. Mantegna, C. K. Peng, M. Simons, H. E. Stanley

Research output: Contribution to journalArticlepeer-review

166 Scopus citations

Abstract

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

Original languageEnglish
Pages (from-to)64-70
Number of pages7
JournalBiophysical Journal
Volume67
Issue number1
DOIs
StatePublished - Jul 1994
Externally publishedYes

Bibliographical note

Funding Information:
Partial support was provided to C.-K. Peng by National Institutes of Health/National Institutes of Mental Health, to A. L. Goldberger by the G. Harold and Leila Y. Mathers Charitable Foundation, the National Heart, Lung and Blood Institute, and the National Aeronautics and Space Administration, to M. Simons by the American Heart Association, and to S. V. Buldyrev, S. Havlin, R. N. Mantegna, S. M. Ossadnik, and H. E. Stanley by the National Science Foun- dation and Office of Naval Research.

Funding

Partial support was provided to C.-K. Peng by National Institutes of Health/National Institutes of Mental Health, to A. L. Goldberger by the G. Harold and Leila Y. Mathers Charitable Foundation, the National Heart, Lung and Blood Institute, and the National Aeronautics and Space Administration, to M. Simons by the American Heart Association, and to S. V. Buldyrev, S. Havlin, R. N. Mantegna, S. M. Ossadnik, and H. E. Stanley by the National Science Foun- dation and Office of Naval Research.

FundersFunder number
National Science Foun- dation
National Institutes of Health
Office of Naval Research
National Institute of Mental Health
National Aeronautics and Space Administration
American Heart Association
G. Harold and Leila Y. Mathers Charitable Foundation

    Fingerprint

    Dive into the research topics of 'Correlation approach to identify coding regions in DNA sequences'. Together they form a unique fingerprint.

    Cite this