## Abstract

The subsequence character count problem has as its input an array S = s1, ¡, sn of symbols over alphabet ¦² and a natural number m. Its output is: for every i, i = 1, ¡, n - m + 1, the number of different alphabet symbols occurring in the subsequence si, si+1, ¡, si+m-1. The subsequence character count problem is a natural problem that has many uses. It can be solved in linear time for fixed finite alphabets and in time O(n log m) for infinite alphabets. In [1] the problem was used to solve the parameterized matching problem.
The character count problem can be generalized to two dimensions and becomes the submatrix character count problem. Its input is an n x n matrix T over alphabet ¦² and a natural number m. Its output is: for every i,j, i,j = 1, ¡, n - m + 1, the number of different alphabet symbols occurring in the submatrix T[i + k,j + ℓ], k = 0, ¡, m - 1;ℓ = 0, ¡, m - 1.
This problem was motivated by parameterized matching in two dimensions which is a good model for seeking a pattern in an image with a change of color map. The number of different colors in a subarea of an image is considered a "signature". There are many image processing tools that use this measure (see e.g. [5]).
The straightforward one dimensional solution slides a window along the text adding an element and deleting an element at every step. The problem with two dimensions is that at every move of the window there are m elements added and m deleted.
In this paper we present an alternate solution that generalizes to two dimensions. We achieve a O(n2) time solution to the submatrix character count problem over finite fixed alphabet and a O(n2 log m) solution over an infinite alphabet.
The submatrix character count problem is a special case of the color range query problem, where one needs to preprocess a two dimensional nxn array T of symbols over alphabet ¦² - the colors. Subsequently we are interested in answers to queries of the type: Given intervals [i1,j1] and [i2,j2], i1,i2,j1,j 2 ¦Å {1, ¡, n} and i1 ¡Ü j1, i2 ¡Ü j2 give the number of different alphabet symbols (colors) occurring in the submatrix T[k,ℓ], k = i1, ¡,j1, ℓ = i2,¡,j2.
Jonardan and Lopez [6] showed that with a O(n2 log2 n) preprocessing one can answer queries in time O(log2 n). This means that the submatrices character count problem can be solved in time O(n2 log2 n) by preprocessing and then querying, for every location, the m x m submatrix starting at that location.
We are not aware of a faster direct approach for solving the submatrix character count problem. However, problems with a similar flavor, where the desired calculation is a convolution, are solved in electrical engineering by a method called Separable Convolutions or Separable Filters [4]. A similar notion was used by Bird [3] and Baker [2] to solve the two dimensional pattern matching problem.
The contributions of this paper are two-fold. First, We generalize the notion of separable convolutions to separable attributes. We believe it is important to keep this method in mind as an element of the basic algorithmic toolkit. It has proven useful in the past and, we think, will prove useful for solving various two-dimensional problems in the future. Secondly, We use the separable attributes method for providing the fastest algorithm yet for the submatrices character count problem.

Original language | American English |
---|---|

Title of host publication | The thirteenth annual ACM-SIAM symposium on Discrete algorithms |

Publisher | Society for Industrial and Applied Mathematics |

State | Published - 2002 |