Abstract
In large IR systems, information about word occurrence may be stored as a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible correlations between rows and between columns. The methods are based on partitioning the matrix into small blocks and predicting the 1-bit distribution within a block by means of various bit generation models. Each block is then encoded using Huffman or arithmetic coding. Preliminary experimental results indicate improvements over previous methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1991 |
Publisher | Association for Computing Machinery, Inc |
Pages | 63-71 |
Number of pages | 9 |
ISBN (Print) | 0897914481, 9780897914482 |
DOIs | |
State | Published - 1 Sep 1991 |
Event | 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1991 - Chicago, United States Duration: 13 Oct 1991 → 16 Oct 1991 |
Publication series
Name | Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1991 |
---|
Conference
Conference | 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1991 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 13/10/91 → 16/10/91 |
Bibliographical note
Publisher Copyright:© 1991 ACM. All rights reserved.