Space-efficient string indexing for wildcard pattern matching

Moshe Lewenstein, Yakov Nekrich, Jeffrey Scott Vitter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses O(n logεn) bits for any ε> 0 and reports all occ occurrences of a wildcard string in O(m + σg·μ(n) + occ) time, where μ(n) = o(log log log n), σ is the alphabet size, m is the number of alphabet symbols and g is the number of wildcard symbols in the query string. We also present an O(n)-bit index with O((m + σg+ occ) logεn) query time and an O(n(log log n)2)-bit index with O((m + σg+ occ) log log n) query time. These are the first non-trivial data structures for this problem that need o(n log n) bits of space.

Original languageEnglish
Title of host publication31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014
EditorsErnst W. Mayr, Natacha Portier
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages506-517
Number of pages12
Volume25
ISBN (Electronic)9783939897651
DOIs
StatePublished - 1 Mar 2014
Event31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014 - Lyon, France
Duration: 5 Mar 20148 Mar 2014

Conference

Conference31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014
Country/TerritoryFrance
CityLyon
Period5/03/148/03/14

Bibliographical note

Publisher Copyright:
© Moshe Lewenstein, Yakov Nekrich, and Jeffrey Scott Vitter.

Funding

FundersFunder number
Natural Sciences and Engineering Research Council of Canada

    Keywords

    • Compressed data structures
    • Compressed indexes
    • Pattern matching

    Fingerprint

    Dive into the research topics of 'Space-efficient string indexing for wildcard pattern matching'. Together they form a unique fingerprint.

    Cite this