Time-Space Tradeoffs for Finding a Long Common Substring

Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, Matan Kraus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n2/3sn, the LCS problem can be solved in O(s) space and∼O ( n2 s ) time.1 Kociumaka et al. (ESA 2014) generalized this tradeoff to 1sn, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1sn, we show that the LCS problem can be solved in O(s) space and∼O( n2 L·s + n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

Original languageEnglish
Title of host publication31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
EditorsInge Li Gortz, Oren Weimann
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771498
DOIs
StatePublished - 1 Jun 2020
Event31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark
Duration: 17 Jun 202019 Jun 2020

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume161
ISSN (Print)1868-8969

Conference

Conference31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
Country/TerritoryDenmark
CityCopenhagen
Period17/06/2019/06/20

Bibliographical note

Publisher Copyright:
© 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

Funding

Funding Supported by ISF grants no. 1278/16 and 1926/19, a BSF grant no. 2018364, and an ERC grant MPM (no. 683064) under the EU’s Horizon 2020 Research and Innovation Programme.

FundersFunder number
Horizon 2020 Framework Programme683064
European Research Council
United States-Israel Binational Science Foundation2018364
Israel Science Foundation1926/19, 1278/16

    Keywords

    • Local consistency
    • Longest common substring
    • Periodicity
    • Time-space tradeoff

    Fingerprint

    Dive into the research topics of 'Time-Space Tradeoffs for Finding a Long Common Substring'. Together they form a unique fingerprint.

    Cite this