TY - GEN
T1 - Storing text retrieval systems on CD-ROM: compression and encryption considerations
AU - Klein, S.
AU - Bookstein, Abraham
AU - Deerwester, Scott
N1 - Place of conference:Cambridge
PY - 1989
Y1 - 1989
N2 - The emergence of the CD-ROM as a storage medium for full-text databases raises the question of the maximum size database that can be contained by this medium. As an example, the problem of storing the Trésor de la Langue Fran&ccidel;aise on a CD-ROM is examined in this paper. The text alone of this database is 700 megabytes long, more than a CD-ROM can hold. In addition, the dictionary and concordance needed to access these data must be stored. A further constraint is that some of the material is copyrighted, and it is desirable that such material be difficult to decode except through software provided by the system. Pertinent approaches to compression of the various files are reviewed, and the compression of the text is related to the problem of data encryption: Specifically, it is shown that, under simple models of text generation, Huffman encoding produces a bit-string indistinguishable from a representation of coin flips.
AB - The emergence of the CD-ROM as a storage medium for full-text databases raises the question of the maximum size database that can be contained by this medium. As an example, the problem of storing the Trésor de la Langue Fran&ccidel;aise on a CD-ROM is examined in this paper. The text alone of this database is 700 megabytes long, more than a CD-ROM can hold. In addition, the dictionary and concordance needed to access these data must be stored. A further constraint is that some of the material is copyrighted, and it is desirable that such material be difficult to decode except through software provided by the system. Pertinent approaches to compression of the various files are reviewed, and the compression of the text is related to the problem of data encryption: Specifically, it is shown that, under simple models of text generation, Huffman encoding produces a bit-string indistinguishable from a representation of coin flips.
UR - https://scholar.google.co.il/scholar?q=Storing+Text+Retrieval+Systems+on+CD-ROM%3A+Compression+and+Encryption+Considerations%2C&btnG=&hl=en&as_sdt=0%2C5
M3 - Conference contribution
BT - Proc. 12-th ACM-SIGIR Conf
ER -