Abstract
We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S' is a substring of S, then the fingerprint of S' is the subset φ of Σ of precisely the symbols appearing in S'. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n |Σ| log n log |Σ|) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(| Σ| log n).
| Original language | English |
|---|---|
| Pages (from-to) | 409-421 |
| Number of pages | 13 |
| Journal | Journal of Discrete Algorithms |
| Volume | 1 |
| Issue number | 5-6 |
| DOIs | |
| State | Published - Oct 2003 |
Bibliographical note
Funding Information:Giorgio Satta's work was supported in part by MURST under project PRIN: BioInformatica e Ricerca Genomica and by University of Padova, under project Sviluppo di Sistemi ad Addestramento Automatico per l'Analisi del Linguaggio Naturale.
Funding Information:
Amihood Amir was partially supported by NSF grant CCR-01-04494, BSF grant 96-00509, and an Israel–Italy exchange scientist grant.
Funding Information:
Alberto Apostolico's work was supported in part by NSF Grant CCR-9700276, by MURST under project PRIN: BioInformatica e Ricerca Genomica, by the University of Padova under project Development of Novel Pattern Discovery Algorithms and Software, and by an Israel–Italy exchange scientist grant.
Funding Information:
This research was performed during exchange visits conducted, respectively, by the first and third authors at the University of Padova, and by the second author at the Universities of Bar-Ilan and Haifa, as part of an Israel–Italy exchange scientist grant jointly funded by the Israel Ministry of Science and the National Research Council of Italy.
Funding Information:
Gad Landau was partially supported by NSF grants CCR-9610238, and CCR-0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation grants 173/98 and 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award, and an Israel–Italy exchange scientist grant.
Funding
Giorgio Satta's work was supported in part by MURST under project PRIN: BioInformatica e Ricerca Genomica and by University of Padova, under project Sviluppo di Sistemi ad Addestramento Automatico per l'Analisi del Linguaggio Naturale. Amihood Amir was partially supported by NSF grant CCR-01-04494, BSF grant 96-00509, and an Israel–Italy exchange scientist grant. Alberto Apostolico's work was supported in part by NSF Grant CCR-9700276, by MURST under project PRIN: BioInformatica e Ricerca Genomica, by the University of Padova under project Development of Novel Pattern Discovery Algorithms and Software, and by an Israel–Italy exchange scientist grant. This research was performed during exchange visits conducted, respectively, by the first and third authors at the University of Padova, and by the second author at the Universities of Bar-Ilan and Haifa, as part of an Israel–Italy exchange scientist grant jointly funded by the Israel Ministry of Science and the National Research Council of Italy. Gad Landau was partially supported by NSF grants CCR-9610238, and CCR-0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation grants 173/98 and 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award, and an Israel–Italy exchange scientist grant.
| Funders | Funder number |
|---|---|
| FIRST Foundation of the Israel Academy of Science and Humanities | |
| Israel Ministry of Science | |
| MURST | |
| Universities of Bar-Ilan and Haifa, as part of an Israel | |
| National Science Foundation | CCR-0104307, CCR-9700276, CCR-9610238, CCR-01-04494 |
| International Business Machines Corporation | |
| North Atlantic Treaty Organization | PST.CLG.977017 |
| National Research Council | |
| United States-Israel Binational Science Foundation | 96-00509 |
| Università degli Studi di Padova | |
| Israel Science Foundation | 282/01, 173/98 |
Keywords
- Combinatorial algorithms on words
- Design and analysis of algorithms