Small-space and streaming pattern matching with k edits

Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer k, a pattern p of length m, and a text T of length n≥q m, the task is to find substrings of T that are within edit distance k from p. Our main result is a streaming algorithm that solves the problem in tilde O}(k 5}) space11Hereafter, tilde O() hides a poly} (log n) factor. and tilde O(k 8}) amortized time per character of the text, providing answers correct with high probability. This answers a decade-old question: since the discovery of a poly (k log n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly (k log n)-space algorithm was known even in the simpler semi-streaming model, where T comes as a stream but p is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. Our central technical contribution is a new space-efficient deterministic encoding of two strings, called the greedy encoding, which encodes a set of all alignments of cost at most k with a certain property (we call such alignments greedy). On strings of length at most n, the encoding occupies tilde O(k 2}) space. We use the encoding to compress substrings of the text that are close to the pattern. In order to do so, we compute the encoding for substrings of the text and of the pattern, which requires read-only access to the latter. In order to develop the fully streaming algorithm, we further introduce a new edit distance sketch parameterized by integers n > k. For any string of length at most n, the sketch is of size tilde Ooverline{(k} 2}), and it can be computed with an tilde O(k 2})-space streaming algorithm. Given the sketches of two strings, in tilde O(k 3}) time we can compute their edit distance or certify that it is larger than k. This result improves upon tilde O(k 8})-size sketches of Belazzougui and Zhang [FOCS 2016] and very recent tilde O(k 3})-size sketches of Jin, Nelson, and Wu [STACS 2021].

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021
PublisherIEEE Computer Society
Pages885-896
Number of pages12
ISBN (Electronic)9781665420556
DOIs
StatePublished - 2022
Event62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021 - Virtual, Online, United States
Duration: 7 Feb 202210 Feb 2022

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
Volume2022-February
ISSN (Print)0272-5428

Conference

Conference62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021
Country/TerritoryUnited States
CityVirtual, Online
Period7/02/2210/02/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Funding

Tomasz Kociumaka was partially supported by NSF 1652303, 1909046, and HDR TRIPODS 1934846 grants, and an Alfred P. Sloan Fellowship. Ely Porat was partially supported by ISF grants no. 1278/16 and 1926/19, by a BSF grant no. 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Program (grant no. 683064). Tatiana Starikovskaya was partially supported by the ANR-20-CE48-0001 grant from the French National Research Agency (ANR).

FundersFunder number
Alfred P. Sloan Fellowship
EU’s Horizon 2020 research and innovation programANR-20-CE48-0001, 683064
National Science Foundation1909046, 1652303, 1934846
Bonfils-Stanton Foundation2018364
Engineering Research Centers
Agence Nationale de la Recherche
Israel Science Foundation1926/19, 1278/16

    Keywords

    • edit distance
    • pattern matching
    • streaming

    Fingerprint

    Dive into the research topics of 'Small-space and streaming pattern matching with k edits'. Together they form a unique fingerprint.

    Cite this