Efficient sampling of non-strict turnstile data streams

Neta Barkay, Ely Porat, Bar Shalem

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We study the problem of generating a large sample from a data stream of elements (i,v), where the sample consists of pairs (i,Ci) for C i = ∑(i,v)∈stream v. We consider strict turnstile streams and general non-strict turnstile streams, in which Ci may be negative. Our sample is useful for approximating both forward and inverse distribution statistics, within an additive error ε and provable success probability 1 - δ. Our sampling method improves by an order of magnitude the known processing time of each stream element, a crucial factor in data stream applications, thereby providing a feasible solution to the problem. For example, for a sample of size O(ε-2 log(1/δ)) in non-strict streams, our solution requires O((loglog(1/ε))2 + (loglog(1/δ))2) operations per stream element, whereas the best previous solution requires O(ε-2 log2(1/δ)) evaluations of a fully independent hash function per element. We achieve this improvement by constructing an efficient K-elements recovery structure from which K elements can be extracted with probability 1 - δ. Our structure enables our sampling algorithm to run on distributed systems and extract statistics on the difference between streams.

Original languageEnglish
Title of host publicationFundamentals of Computation Theory - 19th International Symposium, FCT 2013, Proceedings
Pages48-59
Number of pages12
DOIs
StatePublished - 2013
Event19th International Symposium on Fundamentals of Computation Theory, FCT 2013 - Liverpool, United Kingdom
Duration: 19 Aug 201321 Aug 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8070 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Symposium on Fundamentals of Computation Theory, FCT 2013
Country/TerritoryUnited Kingdom
CityLiverpool
Period19/08/1321/08/13

Fingerprint

Dive into the research topics of 'Efficient sampling of non-strict turnstile data streams'. Together they form a unique fingerprint.

Cite this