Fast string dictionary lookup with one error

Timothy Chan, Moshe Lewenstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

A set of strings, called a string dictionary, is a basic string data structure. The most primitive query, where one seeks the existence of a pattern in the dictionary, is called a lookup query. Approximate lookup queries, i.e., to lookup the existence of a pattern with a bounded number of errors, is a fundamental string problem. Several data structures have been proposed to do so efficiently. Almost all solutions consider a single error, as will this result. Lately, Belazzougui and Venturini (CPM 2013) raised the question whether one can construct efficient indexes that support lookup queries with one error in optimal query time, that is, O(|p|/ω + occ), where p is the query, ω the machine word-size, and occ the number of occurrences. Specifically, for the problem of one mismatch and constant alphabet size, we obtain optimal query time. For a dictionary of d strings our proposed index uses O(ωd log1+ε d) additional bit space (beyond the space required to access the dictionary data, which can be maintained in compressed form). Our results are parameterized for a space-time tradeoff. We propose more results for the case of lookup queries with one insertion/ deletion on dictionaries over a constant sized alphabet. These results are especially effective for large patterns.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceedings
EditorsUgo Vaccaro, Ely Porat, Ferdinando Cicalese
PublisherSpringer Verlag
Pages114-123
Number of pages10
ISBN (Print)9783319199283
DOIs
StatePublished - 2015
Event26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015 - Ischia Island, Italy
Duration: 29 Jun 20151 Jul 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9133
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015
Country/TerritoryItaly
CityIschia Island
Period29/06/151/07/15

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2015.

Funding

M. Lewenstein—This research is supported by a BSF grant 2010437 and a GIF grant 1147/2011. T. Chan—The research is supported by an NSERC grant.

FundersFunder number
German-Israeli Foundation for Scientific Research and Development
Natural Sciences and Engineering Research Council of Canada
United States-Israel Binational Science Foundation
Natural Sciences and Engineering Research Council of Canada
United States-Israel Binational Science Foundation2010437, 1147/2011

    Fingerprint

    Dive into the research topics of 'Fast string dictionary lookup with one error'. Together they form a unique fingerprint.

    Cite this