Abstract
We develop a quantitative method for analyzing repetitions of identical short oligomers in coding and noncoding DNA sequences. We analyze sequences presently available in the GenBank separately for primate, mammal, vertebrate, rodent, invertebrate and plant taxonomic partitions. We find that some oligomers “cluster” more than they would if randomly distributed, while other oligomers “repel” each other. To quantify this degree of clustering, we define clustering measures. We find that (i) clustering significantly differs in coding and noncoding DNA; (ii) in most cases, monomers, dimers and tetramers cluster in noncoding DNA but appear to repel each other in coding DNA. (iii) The degree of clustering for different sources (primates, invertebrates, and plants) is more conserved among these sources in the case of coding DNA than in the case of noncoding DNA. (iv) In contrast to other oligomers, we find that trimers always prefer to cluster, (v) Clustering of each particular oligomer is conserved within the same organism.
Original language | English |
---|---|
Pages (from-to) | 79-87 |
Number of pages | 9 |
Journal | Journal of Biomolecular Structure and Dynamics |
Volume | 17 |
Issue number | 1 |
DOIs | |
State | Published - Aug 1999 |
Bibliographical note
Funding Information:We wish to thank N. Goodman, A. Shehter, F. W. Starr and especially C. Smith for helpful discussions. We also thank referees for a number of helpful suggestions. We acknowledge NIH for financial support. N. V. D. is supported by NIH NRSA molecular biophysics predoctoral traineeship (GM0829I-09).
Funding
We wish to thank N. Goodman, A. Shehter, F. W. Starr and especially C. Smith for helpful discussions. We also thank referees for a number of helpful suggestions. We acknowledge NIH for financial support. N. V. D. is supported by NIH NRSA molecular biophysics predoctoral traineeship (GM0829I-09).
Funders | Funder number |
---|---|
National Institutes of Health | GM0829I-09 |