Compressed hierarchical clustering

Gilad Baruch, Dana Shapira, Shmuel T. Klein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a double usage of the {\sf xor}ing operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

Original languageEnglish
Title of host publicationProceedings - DCC 2018
Subtitle of host publication2018 Data Compression Conference
EditorsAli Bilgin, James A. Storer, Joan Serra-Sagrista, Michael W. Marcellin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages399
Number of pages1
ISBN (Electronic)9781538648834
DOIs
StatePublished - 19 Jul 2018
Event2018 Data Compression Conference, DCC 2018 - Snowbird, United States
Duration: 27 Mar 201830 Mar 2018

Publication series

NameData Compression Conference Proceedings
Volume2018-March
ISSN (Print)1068-0314

Conference

Conference2018 Data Compression Conference, DCC 2018
Country/TerritoryUnited States
CitySnowbird
Period27/03/1830/03/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Keywords

  • Hamming distance
  • Hierarchical Clustering
  • Run length encoding

Fingerprint

Dive into the research topics of 'Compressed hierarchical clustering'. Together they form a unique fingerprint.

Cite this