TY - JOUR
T1 - Shuffling biological sequences
AU - Kandel, D.
AU - Matias, Y.
AU - Unger, R.
AU - Winkler, P.
PY - 1996/12/5
Y1 - 1996/12/5
N2 - This paper considers the following sequence shuffling problem: Given a biological sequence (either DNA or protein) s, generate a random instance among all the permutations of s that exhibit the same frequencies of k-lets (e.g. dinucleotides, doublets of amino acids, triplets, etc.). Since certain biases in the usage of k-lets are fundamental to biological sequences, effective generation of such sequences is essential for the evaluation of the results of many sequence analysis tools. This paper introduces two sequence shuffling algorithms: A simple swapping-based algorithm is shown to generate a near-random instance and appears to work well, although its efficiency is unproven; a generation algorithm based on Euler tours is proven to produce a precisely uniform instance, and hence solve the sequence shuffling problem, in time not much more than linear in the sequence length.
AB - This paper considers the following sequence shuffling problem: Given a biological sequence (either DNA or protein) s, generate a random instance among all the permutations of s that exhibit the same frequencies of k-lets (e.g. dinucleotides, doublets of amino acids, triplets, etc.). Since certain biases in the usage of k-lets are fundamental to biological sequences, effective generation of such sequences is essential for the evaluation of the results of many sequence analysis tools. This paper introduces two sequence shuffling algorithms: A simple swapping-based algorithm is shown to generate a near-random instance and appears to work well, although its efficiency is unproven; a generation algorithm based on Euler tours is proven to produce a precisely uniform instance, and hence solve the sequence shuffling problem, in time not much more than linear in the sequence length.
UR - http://www.scopus.com/inward/record.url?scp=0004608699&partnerID=8YFLogxK
U2 - 10.1016/S0166-218X(97)81456-4
DO - 10.1016/S0166-218X(97)81456-4
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0004608699
SN - 0166-218X
VL - 71
SP - 171
EP - 185
JO - Discrete Applied Mathematics
JF - Discrete Applied Mathematics
IS - 1-3
ER -