## Abstract

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront. and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution. The performance bounds all have a similar character. For example, for the indexing problem with n = |t| and m = |p|, the query time for k substitutions is O(m + (c_{1} log n)^{k}/_{k!} + # matches), with a data structure of size O(n (c_{2} log n)^{k}/_{k!}) and a preprocessing time of O(n(c_{2} log n)^{k}/_{k!}), where c_{1}, C _{2} > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.

Original language | English |
---|---|

Pages (from-to) | 91-100 |

Number of pages | 10 |

Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |

DOIs | |

State | Published - 2004 |

Event | Proceedings of the 36th Annual ACM Symposium on Theory of Computing - Chicago, IL, United States Duration: 13 Jun 2004 → 15 Jun 2004 |

## Keywords

- Approximate pattern matching
- Dictionary matching
- Dictionary query
- Suffix trees
- Text indexing
- Wild-cards