The String-to-Dictionary Matching Problem is defined, in which a string is searched for in all the possible concatenations of the elements of a given dictionary, with applications to compressed matching in variable to fixed-length encodings, such as Tunstall's. Two algorithms based on suffix trees are suggested, the one focusing on the dictionary, the other on the pattern to be searched for. The problem is then extended to deal also with patterns that include gaps. Experiments on natural language text suggest that compressed search might use less comparisons for long enough patterns, in spite of a potentially large number of encodings.
- compressed matching
- suffix trees