TY - GEN

T1 - Sparse suffix tree construction in small space

AU - Bille, Philip

AU - Fischer, Johannes

AU - Gørtz, Inge Li

AU - Kopelowitz, Tsvi

AU - Sach, Benjamin

AU - Vildhøj, Hjalte Wedel

PY - 2013

Y1 - 2013

N2 - We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of length n, using only O(b) words of space during construction. Attempts at breaking the naive bound of Ω(nb) time for this problem can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the suffixes were evenly spaced in T. In this paper there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in O(nlog2 b) time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O(√n). Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given.

AB - We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of length n, using only O(b) words of space during construction. Attempts at breaking the naive bound of Ω(nb) time for this problem can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the suffixes were evenly spaced in T. In this paper there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in O(nlog2 b) time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O(√n). Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given.

UR - http://www.scopus.com/inward/record.url?scp=84880298094&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-39206-1_13

DO - 10.1007/978-3-642-39206-1_13

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:84880298094

SN - 9783642392054

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 148

EP - 159

BT - Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Proceedings

T2 - 40th International Colloquium on Automata, Languages, and Programming, ICALP 2013

Y2 - 8 July 2013 through 12 July 2013

ER -