We introduce and develop new techniques to quantify DNA patchiness, and to quantify characteristics of its mosaic structure. These techniques, which involve calculating two functions, α(l) and β(l), measure correlations at length scale l and detect distinct characteristic patch sizes embedded in scale-invariant patch size distributions. Using these new methods, we address a number of issues relating to the mosaic structure of genomic DNA. We find several distinct characteristic patch sizes in certain genomic sequences, and compare, contrast, and quantify the correlation properties of different sequences, including a number of yeast, human, and prokaryotic sequences. We exclude the possibility that the correlation properties and the known mosaic structure of DNA can be explained either by simple Markov processes or by tandem repeats of dinucleotides. We find that the distinct patch sizes in all 16 yeast chromosomes are similar. Furthermore, we test the hypothesis that, for yeast, patchiness is caused by the alternation of coding and noncoding regions, and the hypothesis that in human sequences patchiness is related to repetitive sequences. We find that, by themselves, neither the alternation of coding and noncoding regions, nor repetitive sequences, can fully explain the long-range correlation properties of DNA.
|Number of pages||10|
|Issue number||2 I|
|State||Published - Feb 1997|
Bibliographical noteFunding Information:
We wish to thank A. L. Goldberger, I. GroBe, P. Ivanov, C.-K. Peng, R. Mantegna, and M. Simons for significant help at the initial stages of this work. We also wish very much to thank those who have made public the newly sequenced yeast chromosomes not yet incorporated into the Gen- Bank database (Bussey et al., 1995; Feldmann et al., 1995; Galibert et al., unpublished observations; Dietrich et al., unpublished observations). We also thank the anonymous referees. We thank the National Institutes of Health for support.