A systematic approach to compressing a full-text retrieval system

Abraham Bookstein, Shmuel T. Klein, D. A. Ziff

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

This article reports on a variety of compression algorithms developed in the context of a project to put all the data files for a full-text retrieval system on CD-ROM. In the context of inexpensive pre-processing, a text-compression algorithm is presented that is based on Markov-modeled Huffman coding on an extended alphabet. Data structures are examined for facilitating random access into the compressed text. In addition, new algorithms are presented for compression of word indices, both the dictionaries (word lists) and the text pointers (concordances). The ARTFL database is used as a test case throughout the article.

Original languageEnglish
Pages (from-to)795-806
Number of pages12
JournalInformation Processing and Management
Volume28
Issue number6
DOIs
StatePublished - 1992

Fingerprint

Dive into the research topics of 'A systematic approach to compressing a full-text retrieval system'. Together they form a unique fingerprint.

Cite this