Optimal partitioning of data chunks in deduplication systems

Michael Hirsch, Ariel Ish-Shalom, Shmuel T. Klein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Deduplication is a special case of data compression in which repeated chunks of data are stored only once. For very large chunks, this process may be applied even if the chunks are similar and not necessarily identical, and then the encoding of duplicate data consists of a sequence of pointers to matching parts. However, not all the pointers are worth being kept, as they incur some storage overhead. A linear, sub-optimal solution of this partition problem is presented, followed by an optimal solution with cubic time complexity and requiring quadratic space.

Original languageEnglish
Title of host publicationProceedings of the Prague Stringology Conference 2013, PSC 2013
Pages128-141
Number of pages14
StatePublished - 2013
EventPrague Stringology Conference 2013, PSC 2013 - Prague, Czech Republic
Duration: 2 Sep 20134 Sep 2013

Publication series

NameProceedings of the Prague Stringology Conference 2013, PSC 2013

Conference

ConferencePrague Stringology Conference 2013, PSC 2013
Country/TerritoryCzech Republic
CityPrague
Period2/09/134/09/13

Bibliographical note

Place of conference:Czech Republic

Fingerprint

Dive into the research topics of 'Optimal partitioning of data chunks in deduplication systems'. Together they form a unique fingerprint.

Cite this