## Abstract

We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S' is a substring of S, then the fingerprint of S' is the subset φ of Σ of precisely the symbols appearing in S'. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n |Σ| log n log |Σ|) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(| Σ| log n).

Original language | English |
---|---|

Pages (from-to) | 409-421 |

Number of pages | 13 |

Journal | Journal of Discrete Algorithms |

Volume | 1 |

Issue number | 5-6 |

DOIs | |

State | Published - Oct 2003 |

### Bibliographical note

Funding Information:Giorgio Satta's work was supported in part by MURST under project PRIN: BioInformatica e Ricerca Genomica and by University of Padova, under project Sviluppo di Sistemi ad Addestramento Automatico per l'Analisi del Linguaggio Naturale.

Funding Information:

Amihood Amir was partially supported by NSF grant CCR-01-04494, BSF grant 96-00509, and an Israel–Italy exchange scientist grant.

Funding Information:

Alberto Apostolico's work was supported in part by NSF Grant CCR-9700276, by MURST under project PRIN: BioInformatica e Ricerca Genomica, by the University of Padova under project Development of Novel Pattern Discovery Algorithms and Software, and by an Israel–Italy exchange scientist grant.

Funding Information:

This research was performed during exchange visits conducted, respectively, by the first and third authors at the University of Padova, and by the second author at the Universities of Bar-Ilan and Haifa, as part of an Israel–Italy exchange scientist grant jointly funded by the Israel Ministry of Science and the National Research Council of Italy.

Funding Information:

Gad Landau was partially supported by NSF grants CCR-9610238, and CCR-0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation grants 173/98 and 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award, and an Israel–Italy exchange scientist grant.

## Keywords

- Combinatorial algorithms on words
- Design and analysis of algorithms