TY - JOUR
T1 - String matching with up to k swaps and mismatches
AU - Lipsky, Ohad
AU - Porat, Benny
AU - Porat, Ely
AU - Riva Shalom, B.
AU - Tzur, Asaf
PY - 2010/9
Y1 - 2010/9
N2 - Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximate matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was solved in O(nmlogm) time. Yet, the problem of string matching with at most k swaps and mismatches errors was open. In this paper, we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(nklogm).
AB - Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximate matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was solved in O(nmlogm) time. Yet, the problem of string matching with at most k swaps and mismatches errors was open. In this paper, we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(nklogm).
KW - Pattern matching
KW - Swap
UR - http://www.scopus.com/inward/record.url?scp=77955473506&partnerID=8YFLogxK
U2 - 10.1016/j.ic.2010.04.001
DO - 10.1016/j.ic.2010.04.001
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:77955473506
SN - 0890-5401
VL - 208
SP - 1020
EP - 1030
JO - Information and Computation
JF - Information and Computation
IS - 9
ER -