The submatrices character count problem: An efficient solution using separable values

Amihood Amir, Kenneth W. Church, Emanuel Dar

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The subsequence character count problem has as its input an array S = S[1], . . . , S[n] of symbols over alphabet Σ and a natural number m. Its output is: for every i, i = 1, . . . , n - m + 1, the number of different alphabet symbols occurring in the subsequence S[i], S[i + 1], . . . , S[i + m - 1], The subsequence character count problem is a natural problem that has many uses. It can be solved in linear time for finite alphabets and in time O(n log m) for infinite alphabets. When the character count problem is generalized to two dimensions it becomes the submatrix character count problem. Its input is an n×n matrix T over alphabet Σ and a natural number m. Its output is: for every i, j, i, j = 1, . . . , n - m + 1, the number of different alphabet symbols occurring in the submatrix T[i + k,j + ℓ], k = 0, . . . , m - 1; ℓ = 0, . . . , m - 1. The straightforward one-dimensional solution slides a window along the text adding an element and deleting an element at every step. The problem with two dimensions is that at every move of the window there are m elements added and m deleted. In this paper, we present an alternate one-dimensional solution that generalizes to two dimensions. We achieve a O(n2) time solution to the submatrix character count problem over a finite alphabet and a O(n2 log m) solution over an infinite alphabet.

Original languageEnglish
Pages (from-to)100-116
Number of pages17
JournalInformation and Computation
Volume190
Issue number1
DOIs
StatePublished - 10 Apr 2004

Bibliographical note

Funding Information:
A short abstract of the results presented in this paper appeared in the Proceedings of the 13th Annual ACM/SIAM Symposium on Discrete Algorithms [3]. ∗Corresponding author. Fax: +972-3-736-0498. E-mail addresses: amir@macs.biu.ac.il (A. Amir), kwc@research.att.com (K.W. Church), dar@cs.biu.ac.il (E. Dar). 1Partially supported by NSF Grant CCR-01-04494 and ISF Grant 282/01. Part of this work was done while the author was at AT&T Labs – Research and DIMACS. 2Fax: 1-973-360-8077. 3This work is part of E. Dar’s Ph.D. Thesis.

Fingerprint

Dive into the research topics of 'The submatrices character count problem: An efficient solution using separable values'. Together they form a unique fingerprint.

Cite this