Property matching and weighted matching

Amihood Amir, Eran Chencinski, Costas Iliopoulos, Tsvi Kopelowitz, Hui Zhang

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

In many pattern matching applications the text has some properties attached to its various parts. Pattern Matching with Properties (Property Matching, for short), involves a string matching between the pattern and the text, and the requirement that the text part satisfies some property. Some immediate examples come from molecular biology where it has long been a practice to consider special areas in the genome by their structures. It is straightforward to do sequential matching in a text with properties. However, indexing in a text with properties becomes difficult if we desire the time to be output dependent. We present an algorithm for indexing a text with properties in O (n log | Σ | + n log log n) time for preprocessing and O (| P | log | Σ | + toccπ) per query, where n is the length of the text, P is the sought pattern, Σ is the alphabet, and toccπ is the number of occurrences of the pattern that satisfy some property π. As a practical use of Property Matching we show how to solve Weighted Matching problems using techniques from Property Matching. Weighted sequences have recently been introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a "statistical image" of this set, where we are given the probability of every symbol's occurrence at every text location. Weighted matching problems are pattern matching problems where the given text is weighted. We present a reduction from Weighted Matching to Property Matching that allows off-the-shelf solutions to numerous weighted matching problems including indexing, swapped matching, parameterized matching, approximate matching, and many more. Assuming that one seeks the occurrence of pattern P with probability ε{lunate} in weighted text T of length n, we reduce the problem to a property matching problem of pattern P in text T of length O (n (frac(1, ε{lunate}))2 log frac(1, ε{lunate})).

Original languageEnglish
Pages (from-to)298-310
Number of pages13
JournalTheoretical Computer Science
Volume395
Issue number2-3
DOIs
StatePublished - 1 May 2008

Bibliographical note

Funding Information:
The first author was partly supported by NSF grant CCR-01-04494 and ISF grant 35/05.

Keywords

  • Pattern matching
  • Position-weight-matrices
  • Weighted indexing
  • Weighted swap matching

Fingerprint

Dive into the research topics of 'Property matching and weighted matching'. Together they form a unique fingerprint.

Cite this