TY - GEN

T1 - Indexing with gaps

AU - Lewenstein, Moshe

PY - 2011

Y1 - 2011

N2 - In Indexing with Gaps one seeks to index a text to allow pattern queries that allow gaps within the pattern query. Formally a gapped-pattern over alphabet Σ is a pattern of the form p = p1g1p 2g2⋯gℓpℓ+1, where ∀i, pi ∈ Σ* and each gi is a gap length ∈ N. Often one considers these patterns with some bound constraints, for example, all gaps are bounded by a gap-bound G. Near-optimal solutions have, lately, been proposed for the case of one gap only with a predetermined size. More specifically, an indexing solution for patterns of the form p 1·g·p2, where g is known apriori. In this case the solutions mentioned are preprocessed in O(n logε n) time and O(n) space, where the pattern queries are answered in O(|p1| + |p2|), for constant sized alphabets. For the more general case when there is a bound G these results can be easily adapted with a multiplicative factor of O(G) for the preprocessing, i.e. O(n log ε nG) preprocessing time and O(nG) preprocessing space. Alas, these solutions do not lend to more than one gap. In this paper we propose a solution for k gaps one with preprocessing time O(nG2k log k n log log n) and space of O(nG2k logk n) and query time O(m + 2k log log n), where m = Σi=1 |pi|.

AB - In Indexing with Gaps one seeks to index a text to allow pattern queries that allow gaps within the pattern query. Formally a gapped-pattern over alphabet Σ is a pattern of the form p = p1g1p 2g2⋯gℓpℓ+1, where ∀i, pi ∈ Σ* and each gi is a gap length ∈ N. Often one considers these patterns with some bound constraints, for example, all gaps are bounded by a gap-bound G. Near-optimal solutions have, lately, been proposed for the case of one gap only with a predetermined size. More specifically, an indexing solution for patterns of the form p 1·g·p2, where g is known apriori. In this case the solutions mentioned are preprocessed in O(n logε n) time and O(n) space, where the pattern queries are answered in O(|p1| + |p2|), for constant sized alphabets. For the more general case when there is a bound G these results can be easily adapted with a multiplicative factor of O(G) for the preprocessing, i.e. O(n log ε nG) preprocessing time and O(nG) preprocessing space. Alas, these solutions do not lend to more than one gap. In this paper we propose a solution for k gaps one with preprocessing time O(nG2k log k n log log n) and space of O(nG2k logk n) and query time O(m + 2k log log n), where m = Σi=1 |pi|.

UR - http://www.scopus.com/inward/record.url?scp=80053985128&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-24583-1_14

DO - 10.1007/978-3-642-24583-1_14

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:80053985128

SN - 9783642245824

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 135

EP - 143

BT - String Processing and Information Retrieval - 18th International Symposium, SPIRE 2011, Proceedings

T2 - 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011

Y2 - 17 October 2011 through 21 October 2011

ER -