Approximate pattern matching with the L 1, L 2 and L metrics

Ohad Lipsky, Ely Porat

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Given an alphabet ∑={1,2,⋯,|∑|} text string T ∑ n and a pattern string P ∑ m, for each i=1,2,⋯,n-m+1 define L p (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L p distance is to compute L p (i) for every i=1,2,⋯,n-m+1. We discuss the problem for d=1,2,∞. First, in the case of L 1 matching (pattern matching with an L 1 distance) we show a reduction of the string matching with mismatches problem to the L 1 matching problem and we present an algorithm that approximates the L 1 matching up to a factor of 1+ε, which has an O(1/ε 2n log m log|Σ|) run time. Then, the L 2 matching problem (pattern matching with an L 2 distance) is solved with a simple O(nlog∈m) time algorithm. Finally, we provide an algorithm that approximates the L matching up to a factor of 1+ε with a run time of O(1/εnlog mlog|Σ|). We also generalize the problem of String Matching with mismatches to have weighted mismatches and present an O(nlog∈ 4 m) algorithm that approximates the results of this problem up to a factor of O(log∈m) in the case that the weight function is a metric.

Original languageEnglish
Pages (from-to)335-348
Number of pages14
JournalAlgorithmica
Volume60
Issue number2
DOIs
StatePublished - Jun 2011

Bibliographical note

Funding Information:
Research supported in part by US-Israel Binational Science Foundation.

Funding

Research supported in part by US-Israel Binational Science Foundation.

FundersFunder number
United States-Israel Binational Science Foundation

    Keywords

    • Approximate string matching
    • Combinatorial algorithms on words
    • Design and analysis of algorithms
    • Hamming distance

    Fingerprint

    Dive into the research topics of 'Approximate pattern matching with the L 1, L 2 and L metrics'. Together they form a unique fingerprint.

    Cite this