Abstract
We present problems in different application areas: tandem repeats (computational biology), poetry and music analysis, and author validation, that require a more sophisticated pattern matching model that hitherto considered. We introduce a new matching criterion - generalized function matching - that encapsulates the notion suggested by the above problems. The generalized function matching problem has as its input a text T of length n over alphabet ∑T ∪ {φ} and a pattern P = P[0]P[1]⋯P[m-1] of length m over alphabet ∑P ∪ {φ}. We seek all text locations i where the prefix of the substring that starts at i is equal to f(P[0])f(P[1]) ⋯ f(P[m-1]), for some function f : ∑P → ∑*T. We give a polynomial time algorithm for the generalized pattern matching problem over bounded alphabets. We identify in this problem an important new phenomenon in pattern matching. One where there is a significant complexity difference between the bounded alphabet and infinite alphabet case. We prove that the generalized pattern matching problem over infinite alphabets is NP-hard. To our knowledge, this is the first case in the literature where a pattern matching problem over a bounded alphabet can be solved in polynomial time but the infinite alphabet version is NP-hard. Keywords: Pattern matching, function matching, parameterized matching, NP-hard.
Original language | English |
---|---|
Pages (from-to) | 41-52 |
Number of pages | 12 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 3341 |
DOIs | |
State | Published - 2004 |
Bibliographical note
Funding Information:★ Partly supported by NSF grant CCR-01-04494 and ISF grant 282/01. ★★ Partly supported by ISF grant 282/01.
Funding
★ Partly supported by NSF grant CCR-01-04494 and ISF grant 282/01. ★★ Partly supported by ISF grant 282/01.
Funders | Funder number |
---|---|
National Science Foundation | CCR-01-04494 |
Israel Science Foundation | 282/01 |