An overhead reduction technique for mega-state compression schemes

Abraham Bookstein, Shmuel T. Klein, Timo Raita

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Many of the most effective compression methods involve complicated models. Unfortunately, as model complexity increases, so does the cost of storing the model itself. This paper examines a method to reduce the amount of storage needed to represent a Markov model with an extended alphabet, by applying a clustering scheme that brings together similar states. Experiments run on a variety of large natural language texts show that much of the overhead of storing the model can be saved at the cost of a very small loss of compression efficiency.

Original languageEnglish
Pages (from-to)745-760
Number of pages16
JournalInformation Processing and Management
Issue number6
StatePublished - Nov 1997

Bibliographical note

Funding Information:
* The work of the first author (AB) was supported, in part, by NSF Grant IRI-9307895-A01. The author gratefully acknowledges this support. We also wish to acknowledge support given by the Academy of Finland to TR, t To whom all correspondence should be addressed: tel: (773) 702-8268, fax: (773) 702-9861,;, and


Dive into the research topics of 'An overhead reduction technique for mega-state compression schemes'. Together they form a unique fingerprint.

Cite this