TY - GEN
T1 - Less space
T2 - 24th International Symposium on Algorithms and Computation, ISAAC 2013
AU - Lewenstein, Moshe
AU - Munro, J. Ian
AU - Raman, Venkatesh
AU - Thankachan, Sharma V.
PY - 2013
Y1 - 2013
N2 - Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n], such that whenever a pattern P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is O(nlog k n) words or O(nlog k+1 n) bits and the query time is O(p+2h loglogn+occ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlogk+ε n) bits index achieving the same query time as that of Cole et al.'s index, where 0<ε<1 is an arbitrary small constant. Then we propose another index of size O(nlog k nlogσ) bits, but with a slightly higher query time of O(p+2 h logn+occ), where σ denotes the alphabet set size.
AB - Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n], such that whenever a pattern P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is O(nlog k n) words or O(nlog k+1 n) bits and the query time is O(p+2h loglogn+occ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlogk+ε n) bits index achieving the same query time as that of Cole et al.'s index, where 0<ε<1 is an arbitrary small constant. Then we propose another index of size O(nlog k nlogσ) bits, but with a slightly higher query time of O(p+2 h logn+occ), where σ denotes the alphabet set size.
UR - http://www.scopus.com/inward/record.url?scp=84893341499&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-45030-3_9
DO - 10.1007/978-3-642-45030-3_9
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84893341499
SN - 9783642450297
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 89
EP - 99
BT - Algorithms and Computation - 24th International Symposium, ISAAC 2013, Proceedings
Y2 - 16 December 2013 through 18 December 2013
ER -