TY - GEN

T1 - Exact and approximate pattern matching in the streaming model

AU - Porat, Benny

AU - Porat, Ely

PY - 2009

Y1 - 2009

N2 - We present a fully online randomized algorithm for the classical pattern matching problem that uses merely O(log m) space1, breaking the O(m) barrier that held for this problem for a long time. Our method can be used as a tool in many practical applications, including monitoring Internet traffic and firewall applications. In our online model we first receive the pattern P of size m and preprocess it. After the preprocessing phase, the characters of the text T of size n arrive one at a time in an online fashion. For each index of the text input we indicate whether the pattern matches the text at that location index or not. Clearly, for index i, an indication can only be given once all characters from index i till index i + m - 1 have arrived. Our goal is to provide such answers while using minimal space, and while spending as little time as possible on each character (time and space which are in O(poly log(n))). We present an algorithm whereby both false positive and false negative answers are allowed with probability of at most 1/n3. Thus, overall, the correct answer for all positions is returned with a probability of 1/n 2. The time which our algorithm spends on each input character is bounded by O(log m), and the space complexity is O(log m) words. We also present a solution in the same model for the pattern matching with k mismatches problem. In this problem, a match means allowing up to k symbol mismatches between the pattern and the subtext beginning at index i. We provide an algorithm in which the time spent on each character is bounded by O(k 2poly(log m)), and the space complexity is O(k3poly(log m)) words.

AB - We present a fully online randomized algorithm for the classical pattern matching problem that uses merely O(log m) space1, breaking the O(m) barrier that held for this problem for a long time. Our method can be used as a tool in many practical applications, including monitoring Internet traffic and firewall applications. In our online model we first receive the pattern P of size m and preprocess it. After the preprocessing phase, the characters of the text T of size n arrive one at a time in an online fashion. For each index of the text input we indicate whether the pattern matches the text at that location index or not. Clearly, for index i, an indication can only be given once all characters from index i till index i + m - 1 have arrived. Our goal is to provide such answers while using minimal space, and while spending as little time as possible on each character (time and space which are in O(poly log(n))). We present an algorithm whereby both false positive and false negative answers are allowed with probability of at most 1/n3. Thus, overall, the correct answer for all positions is returned with a probability of 1/n 2. The time which our algorithm spends on each input character is bounded by O(log m), and the space complexity is O(log m) words. We also present a solution in the same model for the pattern matching with k mismatches problem. In this problem, a match means allowing up to k symbol mismatches between the pattern and the subtext beginning at index i. We provide an algorithm in which the time spent on each character is bounded by O(k 2poly(log m)), and the space complexity is O(k3poly(log m)) words.

UR - http://www.scopus.com/inward/record.url?scp=77952401274&partnerID=8YFLogxK

U2 - 10.1109/FOCS.2009.11

DO - 10.1109/FOCS.2009.11

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:77952401274

SN - 9780769538501

T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

SP - 315

EP - 323

BT - Proceedings - 50th Annual Symposium on Foundations of Computer Science, FOCS 2009

T2 - 50th Annual Symposium on Foundations of Computer Science, FOCS 2009

Y2 - 25 October 2009 through 27 October 2009

ER -