On the randomness of compressed data

Shmuel T. Klein, Dana Shapira

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman.

Original languageEnglish
Article number196
JournalInformation (Switzerland)
Volume11
Issue number4
DOIs
StatePublished - 1 Apr 2020

Bibliographical note

Publisher Copyright:
© 2020 by the authors.

Keywords

  • Arithmetic coding
  • Data compression
  • Huffman coding
  • Ziv-Lempel coding

Fingerprint

Dive into the research topics of 'On the randomness of compressed data'. Together they form a unique fingerprint.

Cite this