Managing unbounded-length keys in comparison-driven data structures with applications to online indexing?

Amihood Amir, Gianni Franceschini, Roberto Grossi, Tsvi Kopelowitz, Moshe Lewenstein, Noa Lewenstein

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multidimensional points, multiple-precision numbers, multikey data (e.g., records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time O (log n ) per input symbol (as opposed to amortized O (log n ) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O (log n) worst case time per input symbol. Searching for a pattern of length m in the resulting suffix tree takes O (min(m log | Ó|, m + log n ) + tocc) time, where tocc is the number of occurrences of the pattern. The paper also describes more applications and shows how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors, and order maintenance. The technical features of the proposed technique for a given data structure D are the following ones. The new data structure D - is obtained from D by augmenting the latter with an oracle for strings, extending the functionalities of the Dietz-Sleator list for order maintenance [P. F. Dietz and D. D. Sleator, Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, ACM, New York, 1987, pp. 365-372; A. Tsakalidis, Acta Inform. , 21 (1984), pp. 101-112]. The space complexity of D - is S (n)+ O (n) memory cells for storing n keys, where S (n) denotes the space complexity of D . Then, each operation involving O (1) keys taken from D - requires O (T (n)) time, where T (n) denotes the time complexity of the corresponding operation originally supported in D . Each operation involving a key y not stored in D - takes O (T (n)+ |y | ) time, where |y | denotes the length of y . For the special case where the oracle handles suffixes of a string, the achieved insertion time is O (T (n)). Copyright

Original languageEnglish
Pages (from-to)1396-1416
Number of pages21
JournalSIAM Journal on Computing
Volume43
Issue number4
DOIs
StatePublished - 2014

Keywords

  • Search trees
  • Strings
  • Suffix sorting
  • Suffix tree
  • Text indexing

Fingerprint

Dive into the research topics of 'Managing unbounded-length keys in comparison-driven data structures with applications to online indexing?'. Together they form a unique fingerprint.

Cite this