Document retrieval with one wildcard

Moshe Lewenstein, J. Ian Munro, Yakov Nekrich, Sharma V. Thankachan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In this paper we extend several well-known document listing problems to the case when documents contain a substring that approximately matches the query pattern. We study the scenario when the query string can contain a wildcard symbol that matches any alphabet symbol; all documents that match a query pattern with one wildcard must be enumerated. We describe a linear space data structure that reports all documents containing a substring P in O(|P|+σlog⁡log⁡log⁡n+docc) time, where σ is the alphabet size and docc is the number of listed documents. We also describe a succinct solution for this problem, as well as a solution for an extension of this problem. Furthermore our approach enables us to obtain an O(nσ)-space data structure that enumerates all documents containing both a pattern P1 and a pattern P2 in the special case when P1 and P2 differ in one symbol.

Original languageEnglish
Pages (from-to)94-101
Number of pages8
JournalTheoretical Computer Science
Volume635
DOIs
StatePublished - 4 Jul 2016

Bibliographical note

Publisher Copyright:
© 2016

Funding

Early parts of this work appeared in MFCS 2014 [1] . Work is supported by NSERC of Canada and the Canada Research Chairs program.

FundersFunder number
Natural Sciences and Engineering Research Council of Canada
Canada Research Chairs

    Keywords

    • Compressed data structures
    • Document retrieval
    • String searching
    • Wildcards

    Fingerprint

    Dive into the research topics of 'Document retrieval with one wildcard'. Together they form a unique fingerprint.

    Cite this