Bitmaps are data structures occurring often in information retrieval. They are useful, but are also large and expensive to store. For this reason, considerable effort has been devoted to finding techniques for compressing them. These techniques are most effective for sparse bitmaps. We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed. The clustering method efficiently generates a graph structure on the bitmaps. In some situations, it is desired to impose restrictions on the graph; finding the optimal graph satisfying these restrictions is shown to be NP-complete. The results of applying our algorithm to the Bible is presented: for some sets of bitmaps, our method almost doubled in the compression savings.
- Data compression
- bitmap compression
- data storage
- maximum spanning tree:application