Abstract
We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n2/3sn, the LCS problem can be solved in O(s) space and∼O ( n2 s ) time.1 Kociumaka et al. (ESA 2014) generalized this tradeoff to 1sn, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1sn, we show that the LCS problem can be solved in O(s) space and∼O( n2 L·s + n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
Original language | English |
---|---|
Title of host publication | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 |
Editors | Inge Li Gortz, Oren Weimann |
Publisher | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
ISBN (Electronic) | 9783959771498 |
DOIs | |
State | Published - 1 Jun 2020 |
Event | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - Copenhagen, Denmark Duration: 17 Jun 2020 → 19 Jun 2020 |
Publication series
Name | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Volume | 161 |
ISSN (Print) | 1868-8969 |
Conference
Conference | 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 |
---|---|
Country/Territory | Denmark |
City | Copenhagen |
Period | 17/06/20 → 19/06/20 |
Bibliographical note
Publisher Copyright:© 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.
Funding
Funding Supported by ISF grants no. 1278/16 and 1926/19, a BSF grant no. 2018364, and an ERC grant MPM (no. 683064) under the EU’s Horizon 2020 Research and Innovation Programme.
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | 683064 |
European Commission | |
United States-Israel Binational Science Foundation | 2018364 |
Israel Science Foundation | 1926/19, 1278/16 |
Keywords
- Local consistency
- Longest common substring
- Periodicity
- Time-space tradeoff