Abstract
Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a double usage of the {\sf xor}ing operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
Original language | English |
---|---|
Title of host publication | Proceedings - DCC 2018 |
Subtitle of host publication | 2018 Data Compression Conference |
Editors | Ali Bilgin, James A. Storer, Joan Serra-Sagrista, Michael W. Marcellin |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 399 |
Number of pages | 1 |
ISBN (Electronic) | 9781538648834 |
DOIs | |
State | Published - 19 Jul 2018 |
Event | 2018 Data Compression Conference, DCC 2018 - Snowbird, United States Duration: 27 Mar 2018 → 30 Mar 2018 |
Publication series
Name | Data Compression Conference Proceedings |
---|---|
Volume | 2018-March |
ISSN (Print) | 1068-0314 |
Conference
Conference | 2018 Data Compression Conference, DCC 2018 |
---|---|
Country/Territory | United States |
City | Snowbird |
Period | 27/03/18 → 30/03/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- Hamming distance
- Hierarchical Clustering
- Run length encoding