d-k-min-wise independent family of hash functions

Guy Feigenblat, Ely Porat, Ariel Shiftan

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper we introduce a general framework that exponentially improves the space, degree of independence, and time needed by min-wise-based algorithms. The authors, in SODA '11 [1], introduced an exponential time improvement for min-wise-based algorithms. Here we develop an alternative approach that achieves both exponential time and exponential space improvement. The new approach relaxes the need for approximately min-wise hash functions, hence getting around the Ω(log⁡1ϵ) independence lower bound, by defining and constructing a d-k-min-wise independent family of hash functions; surprisingly, for most cases, only 8-wise independence is needed for the additional improvement. Furthermore, we discuss how this construction can be used to improve many min-wise-based algorithms. To our knowledge such definitions, for hash functions, were never previously studied or constructed. Finally, we show how to apply it for similarity and rarity estimation over data streams; other min-wise-based algorithms can be adjusted in the same way.

Original languageEnglish
Pages (from-to)171-184
Number of pages14
JournalJournal of Computer and System Sciences
Volume84
DOIs
StatePublished - 1 Mar 2017

Bibliographical note

Publisher Copyright:
© 2016 Elsevier Inc.

Keywords

  • Data streams
  • Rarity
  • Similarity
  • Windowed data streams
  • min-wise hash functions

Fingerprint

Dive into the research topics of 'd-k-min-wise independent family of hash functions'. Together they form a unique fingerprint.

Cite this