Pattern matching with don't cares and few errors

Raphaël Clifford, Klim Efremenko, Ely Porat, Amir Rothschild

Research output: Contribution to journalArticlepeer-review

39 Scopus citations

Abstract

We present solutions for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give a Θ (n (k + log m log k) log n) time randomised algorithm which finds the correct answer with high probability. We then present a new deterministic Θ (n k2 log2 m) time solution that uses tools originally developed for group testing. Taking our derandomisation approach further we develop an approach based on k-selectors that runs in Θ (n k polylog m) time. Further, in each case the location of the mismatches at each alignment is also given at no extra cost.

Original languageEnglish
Pages (from-to)115-124
Number of pages10
JournalJournal of Computer and System Sciences
Volume76
Issue number2
DOIs
StatePublished - Mar 2010

Bibliographical note

Funding Information:
E-mail addresses: [email protected] (R. Clifford), [email protected] (K. Efremenko), [email protected] (E. Porat), [email protected] (A. Rothschild). 1 Research supported in part by the Binational Science Foundation (BSF). 2 Throughout this paper we assume the RAM model with multiplication when giving the time complexity of the FFT. This is in order to be consistent with the large body of previous work on pattern matching with FFTs.

Funding

E-mail addresses: [email protected] (R. Clifford), [email protected] (K. Efremenko), [email protected] (E. Porat), [email protected] (A. Rothschild). 1 Research supported in part by the Binational Science Foundation (BSF). 2 Throughout this paper we assume the RAM model with multiplication when giving the time complexity of the FFT. This is in order to be consistent with the large body of previous work on pattern matching with FFTs.

FundersFunder number
Engineering and Physical Sciences Research CouncilEP/F02682X/1
United States-Israel Binational Science Foundation

    Keywords

    • Group testing
    • Pattern matching
    • Randomised algorithms
    • String algorithms

    Fingerprint

    Dive into the research topics of 'Pattern matching with don't cares and few errors'. Together they form a unique fingerprint.

    Cite this