Abstract
Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
Original language | English |
---|---|
Title of host publication | Similarity Search and Applications - 11th International Conference, SISAP 2018, Proceedings |
Editors | Stéphane Marchand-Maillet, Yasin N. Silva, Edgar Chávez |
Publisher | Springer Verlag |
Pages | 151-162 |
Number of pages | 12 |
ISBN (Print) | 9783030022235 |
DOIs | |
State | Published - 2018 |
Event | 11th International Conference on Similarity Search and Applications, SISAP 2018 - Lima, Peru Duration: 7 Oct 2018 → 9 Oct 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11223 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 11th International Conference on Similarity Search and Applications, SISAP 2018 |
---|---|
Country/Territory | Peru |
City | Lima |
Period | 7/10/18 → 9/10/18 |
Bibliographical note
Publisher Copyright:© 2018, Springer Nature Switzerland AG.