TY - GEN
T1 - Set intersection and sequence matching
AU - Shiftan, Ariel
AU - Porat, Ely
PY - 2009
Y1 - 2009
N2 - In the classical pattern matching problem, one is given a text and a pattern, both of which are sequences of letters, and is required to find all occurrences of the pattern in the text. We study two modifications of the classical problem, where each letter in the text and pattern is a set (Set Intersection Matching problem) or a sequence (Sequence Matching problem). Two "letters" are considered to be match if the intersection of the two corresponding sets is not empty, or if the two sequences have a common element in the same index. We show the first known non-trivial and efficient algorithms for these problems, for the case the maximum set/sequence size is small. The first, randomized, that takes Θ (2dn ln n log m) time, where d is the maximum set/sequence size, and can also fit, with slight modifications, for the case one is also interested in up to k mismatches. The second is deterministic and takes Θ (4dn log m). The third algorithm, also deterministic, is able to count the number of matches at each index of the text in total running time Θ (∑i=1d ( i|∑|)n log m).
AB - In the classical pattern matching problem, one is given a text and a pattern, both of which are sequences of letters, and is required to find all occurrences of the pattern in the text. We study two modifications of the classical problem, where each letter in the text and pattern is a set (Set Intersection Matching problem) or a sequence (Sequence Matching problem). Two "letters" are considered to be match if the intersection of the two corresponding sets is not empty, or if the two sequences have a common element in the same index. We show the first known non-trivial and efficient algorithms for these problems, for the case the maximum set/sequence size is small. The first, randomized, that takes Θ (2dn ln n log m) time, where d is the maximum set/sequence size, and can also fit, with slight modifications, for the case one is also interested in up to k mismatches. The second is deterministic and takes Θ (4dn log m). The third algorithm, also deterministic, is able to count the number of matches at each index of the text in total running time Θ (∑i=1d ( i|∑|)n log m).
UR - http://www.scopus.com/inward/record.url?scp=70350637475&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-03784-9_28
DO - 10.1007/978-3-642-03784-9_28
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:70350637475
SN - 3642037836
SN - 9783642037832
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 285
EP - 294
BT - String Processing and Information Retrieval - 16th International Symposium, SPIRE 2009, Proceedings
T2 - 16th International Symposium on String Processing and Information Retrieval, SPIRE 2009
Y2 - 25 August 2009 through 27 August 2009
ER -