## Abstract

The subsequence character count problem has as its input an array S = S[1], . . . , S[n] of symbols over alphabet Σ and a natural number m. Its output is: for every i, i = 1, . . . , n - m + 1, the number of different alphabet symbols occurring in the subsequence S[i], S[i + 1], . . . , S[i + m - 1], The subsequence character count problem is a natural problem that has many uses. It can be solved in linear time for finite alphabets and in time O(n log m) for infinite alphabets. When the character count problem is generalized to two dimensions it becomes the submatrix character count problem. Its input is an n×n matrix T over alphabet Σ and a natural number m. Its output is: for every i, j, i, j = 1, . . . , n - m + 1, the number of different alphabet symbols occurring in the submatrix T[i + k,j + ℓ], k = 0, . . . , m - 1; ℓ = 0, . . . , m - 1. The straightforward one-dimensional solution slides a window along the text adding an element and deleting an element at every step. The problem with two dimensions is that at every move of the window there are m elements added and m deleted. In this paper, we present an alternate one-dimensional solution that generalizes to two dimensions. We achieve a O(n^{2}) time solution to the submatrix character count problem over a finite alphabet and a O(n^{2} log m) solution over an infinite alphabet.

Original language | English |
---|---|

Pages (from-to) | 100-116 |

Number of pages | 17 |

Journal | Information and Computation |

Volume | 190 |

Issue number | 1 |

DOIs | |

State | Published - 10 Apr 2004 |

### Bibliographical note

Funding Information:A short abstract of the results presented in this paper appeared in the Proceedings of the 13th Annual ACM/SIAM Symposium on Discrete Algorithms [3]. ∗Corresponding author. Fax: +972-3-736-0498. E-mail addresses: amir@macs.biu.ac.il (A. Amir), kwc@research.att.com (K.W. Church), dar@cs.biu.ac.il (E. Dar). 1Partially supported by NSF Grant CCR-01-04494 and ISF Grant 282/01. Part of this work was done while the author was at AT&T Labs – Research and DIMACS. 2Fax: 1-973-360-8077. 3This work is part of E. Dar’s Ph.D. Thesis.