Gapped String Indexing in Subquadratic Space and Sublinear Query Time

Philip Bille, Inge Li Gørtz, Moshe Lewenstein, Solon P. Pissis, Eva Rotenberg, Teresa Anna Steiner

Research output: Working paper / PreprintPreprint

8 Downloads (Pure)

Abstract

In Gapped String Indexing, the goal is to compactly represent a string $S$ of length $n$ such that given queries consisting of two strings $P_1$ and $P_2$, called patterns, and an integer interval $[\alpha, \beta]$, called gap range, we can quickly find occurrences of $P_1$ and $P_2$ in $S$ with distance in $[\alpha, \beta]$. Due to the many applications of this fundamental problem in computational biology and elsewhere, there is a great body of work for restricted or parameterised variants of the problem. However, for the general problem statement, no improvements upon the trivial $\mathcal{O}(n)$-space $\mathcal{O}(n)$-query time or $\Omega(n^2)$-space $\mathcal{\tilde{O}}(|P_1| + |P_2| + \mathrm{occ})$-query time solutions were known so far. We break this barrier obtaining interesting trade-offs with polynomially subquadratic space and polynomially sublinear query time. In particular, we show that, for every $0\leq \delta \leq 1$, there is a data structure for Gapped String Indexing with either $\mathcal{\tilde{O}}(n^{2-\delta/3})$ or $\mathcal{\tilde{O}}(n^{3-2\delta})$ space and $\mathcal{\tilde{O}}(|P_1| + |P_2| + n^{\delta}\cdot (\mathrm{occ}+1))$ query time, where $\mathrm{occ}$ is the number of reported occurrences. As a new fundamental tool towards obtaining our main result, we introduce the Shifted Set Intersection problem: preprocess a collection of sets $S_1, \ldots, S_k$ of integers such that given queries consisting of three integers $i,j,s$, we can quickly output YES if and only if there exist $a \in S_i$ and $b \in S_j$ with $a+s = b$. We start by showing that the Shifted Set Intersection problem is equivalent to the indexing variant of 3SUM (3SUM Indexing) [Golovnev et al., STOC 2020]. Via several steps of reduction we then show that the Gapped String Indexing problem reduces to polylogarithmically many instances of the Shifted Set Intersection problem.
Original languageEnglish
PublisherarXiv preprint
Pages21
DOIs
StatePublished - 30 Nov 2022

Bibliographical note

21 pages, 5 figures

Keywords

  • cs.DS

Fingerprint

Dive into the research topics of 'Gapped String Indexing in Subquadratic Space and Sublinear Query Time'. Together they form a unique fingerprint.

Cite this