Abstract
We review evidence supporting the idea that the DNA sequence in genese containing non-coding regions is correlated, and that the correlation is remarkably long range - indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33 301 coding and 29 453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Original language | English |
---|---|
Pages (from-to) | 180-192 |
Number of pages | 13 |
Journal | Physica A: Statistical Mechanics and its Applications |
Volume | 221 |
Issue number | 1-3 |
DOIs | |
State | Published - 15 Nov 1995 |
Bibliographical note
Funding Information:tive research efforts. We also wish to thank C. Cantor, C. DeLisi, M. Frank-Kamenetskii, A.Yu. Grosberg, G. Huber, I. Labat, L. Liebovitch, G.S. Michaels, P. Munson, R. Nos-sal, R. Nussinov, R.D. Rosenberg, J.J. Schwartz, M. Schwartz, E.I. Shakhnovich, M.E Shlesinger, N. Shworak, and E.N. Trifonov for valuable discussions. Partial support was provided by an NIH/NIMH Postdoctoral NRSA Fellowshipthe (to C-KP), National Science Foundation, National Institutes of Health (Human Genome Project), the G. Harold and Leila Y. Mathers Charitable Foundation, the National Heart, Lung and Blood Institute, the National Aeronautics and Space Administration, the Israel-USA Binational Science Foundation, Israel Academy of Sciences.
Funding
tive research efforts. We also wish to thank C. Cantor, C. DeLisi, M. Frank-Kamenetskii, A.Yu. Grosberg, G. Huber, I. Labat, L. Liebovitch, G.S. Michaels, P. Munson, R. Nos-sal, R. Nussinov, R.D. Rosenberg, J.J. Schwartz, M. Schwartz, E.I. Shakhnovich, M.E Shlesinger, N. Shworak, and E.N. Trifonov for valuable discussions. Partial support was provided by an NIH/NIMH Postdoctoral NRSA Fellowshipthe (to C-KP), National Science Foundation, National Institutes of Health (Human Genome Project), the G. Harold and Leila Y. Mathers Charitable Foundation, the National Heart, Lung and Blood Institute, the National Aeronautics and Space Administration, the Israel-USA Binational Science Foundation, Israel Academy of Sciences.
Funders | Funder number |
---|---|
Israel-USA Binational Science Foundation | |
NIH/NIMH | |
National Science Foundation | |
National Institutes of Health | |
National Aeronautics and Space Administration | |
G. Harold and Leila Y. Mathers Charitable Foundation | |
Israel National Road Safety Authority | |
Israel Academy of Sciences and Humanities |