TY - JOUR
T1 - A systematic approach to compressing a full-text retrieval system
AU - Bookstein, Abraham
AU - Klein, Shmuel T.
AU - Ziff, D. A.
PY - 1992
Y1 - 1992
N2 - This article reports on a variety of compression algorithms developed in the context of a project to put all the data files for a full-text retrieval system on CD-ROM. In the context of inexpensive pre-processing, a text-compression algorithm is presented that is based on Markov-modeled Huffman coding on an extended alphabet. Data structures are examined for facilitating random access into the compressed text. In addition, new algorithms are presented for compression of word indices, both the dictionaries (word lists) and the text pointers (concordances). The ARTFL database is used as a test case throughout the article.
AB - This article reports on a variety of compression algorithms developed in the context of a project to put all the data files for a full-text retrieval system on CD-ROM. In the context of inexpensive pre-processing, a text-compression algorithm is presented that is based on Markov-modeled Huffman coding on an extended alphabet. Data structures are examined for facilitating random access into the compressed text. In addition, new algorithms are presented for compression of word indices, both the dictionaries (word lists) and the text pointers (concordances). The ARTFL database is used as a test case throughout the article.
UR - http://www.scopus.com/inward/record.url?scp=0041619493&partnerID=8YFLogxK
U2 - 10.1016/0306-4573(92)90069-c
DO - 10.1016/0306-4573(92)90069-c
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0041619493
SN - 0306-4573
VL - 28
SP - 795
EP - 806
JO - Information Processing and Management
JF - Information Processing and Management
IS - 6
ER -