# stoc05

50 %
50 %
Education

Published on January 8, 2008

Author: Camilla

Source: authorstream.com

Optimal Approximations of the Frequency Moments of Data Streams:  Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff The Streaming Model:  The Streaming Model … Stream of elements a1, …, an each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm Frequency Moments [AMS96]:  Frequency Moments [AMS96] n = stream size, m = universe size fi = # occurrences of item i Why are frequency moments important? F0 = # of distinct elements F1 = n = stream size F2 = self-join size k-th moment Applications:  Applications Estimating distinct elements with low space Estimate query selectivity to huge DB without sorting Routers gather # distinct destinations F2 estimates size of self-joins: , Fk measures data skewness fB2 + fA2 = 4 + 1 = 5 The Best Deterministic Algorithm:  The Best Deterministic Algorithm Trivial algorithm for Fk Store/update fi for each item i, sum fik at end Space = O(mlog n): m items i, log n bits to count fi Negative Results [AMS96]: Compute Fk exactly  (m) space Any deterministic alg. outputs X with |Fk – X| < Fk must use (m) space What about randomized algorithms? Randomized Approx Algs for Fk:  Randomized Approx Algs for Fk Randomized alg. -approximates Fk if outputs X s.t. Pr[|Fk – X| <  Fk ] > 2/3 Previous work (table suppresses polylog mn) Matching Upper Bound:  Matching Upper Bound Our Contribution: For every k there is a 1-pass O~(m1-2/k) space algorithm to -approximate Fk Additional Features: Works even if we allow deletions, that is, stream of elements (i, +), (i,-) 2. Constant update time Techniques:  Techniques Our “algorithm’’ 1. Divide frequencies into “buckets” 0, [1, 2), [2, 4), [4, 8), …, [2i-1, 2i), … 2. Estimate size si of each bucket 3. Output X = i si 2ik Previous Algorithms [AMS96, CK04, G04] 1. Cleverly construct small-space estimator X s.t. E[X] = Fk Var[X] small 2. Apply Chebyshev’s inequality What’s Left?:  What’s Left? Remaining Problem: Estimate si = # of elements with frequency in each bucket [2i-1, 2i) Is this always easy? No. Suppose always easy – then could approximate the maximum frequency This is HARD – (m) space [AMS96] However, (m) only applies to “worst-case” streams, otherwise can do better: Countsketch [CCF-C] For the moment, let’s assume::  For the moment, let’s assume: 1. 9 a 1-pass oracle Max returning the maximum frequency using O(B) space (we remove this using CountSketch) 2. We have a very long RAM of random bits (we remove this using Nisan’s generator) items frequency Max Slide11:  Restrict input stream to a random subset of items in {1, …, m}, where items are included independently with probability p. General Idea: Max + Sampling 7 1 1 3 7 3 4 … Random subset = {1, 3} … Slide12:  General Idea: Max + Sampling What are chances the maximum lies in Si = elements r such that fr 2 [2i-1, 2i)? Restrict input to a random subset of items in {1, …, m}, where items are included independently with probability p. q = (1-p) j > i sj ¢ (1 – (1-p)si) Idea: 1. Estimate q as q’ by taking independent trials and computing fraction of max in Si 2. If already estimated sj for j > i, solve this expression for si. When is this estimate any good?:  When is this estimate any good? Recall q = (1-p){j > i} sj (1 – (1-p)si), so estimate si: Need 1. (holds inductively) 2. Requires 9 p so that q > 1/R, where R = # trials used to estimate q (tight concentration of q’) When is this estimate any good?:  When is this estimate any good? Motivates the following: Say a class Si contributes if and only if si > j > i sj /R If R = (log n), then Fk ¼ contributing i si 2ik q = (1-p)j > i sj (1 – (1-p)si) p too large? ! q too small p too small? ! q too small The Idealized Algorithm :  The Idealized Algorithm Use the random string to generate hash functions hjr : [m] -> [2j] for j 2 [log m] and r 2 [R] Restrict stream Str to Strjr, those items i with hjr(i) = 1 For each Strjr, compute Max(Strjr) To estimate si given s’t for t > i, find some j for which “enough” of the Max(Strjr) come from Si, and then set Output F’k = i s’i 2ik Removing the assumptions:  Removing the assumptions [CCF-C02]: 9 a 1-pass O(B)-space algorithm CountSketch which, given stream Str, outputs all x for which fx2 ¸ F2/B 1. Assumption: 9 a 1-pass oracle Max returning the maximum frequency using O(B) space Lemma: If Si = [2i-1, 2i) contributes, then Proof: Holder’s inequality. Recall: Si contributes if and only if si > j > i sj /R Removing the assumptions:  Removing the assumptions 2. We have an infinite string of random bits Consider a space-S algorithm A and a function f, with random strings R1, …, Rn that, when processing a stream, maintains a variable C, and updates as follows: C = C + f(i, Ri) [Indyk00] Then R1, …, Rn can be generated using Nisan’s PRG, and: The new algorithm A’ has space O~(S) The outputs of A’ and A are indistinguishable Our algorithm follows this framework Conclusions:  Conclusions Result: Tight O~(m1-2/k) upper bound Handle deletions (j, -) O~(1) update time Open Problem: Reduce O~ factors

## Add a comment

 User name: Comment:

## Related presentations

#### Unit 26 Facilitating Change in HSC

November 22, 2017

#### Kino_Skovoroda

November 21, 2017

#### Filosof_Skovoroda

November 22, 2017

#### Caroline Clausen PPT Proposal_5

November 22, 2017

#### «Внимание дети»

November 22, 2017

#### Get Full Furnished And Safe Paying Guest For Girl

November 18, 2017

## Related pages

### Mietwagen Stockholm Östermalm günstig - Sixt Autovermietung

Sixt-Station Stockholm Östermalm STOC05. Linnégatan 88: 115 23 Stockholm : Telefon +46-8-41077221 : Fax +46-8-6112215 :

### ACM Symposium on Theory of Computing

Welcome to the STOC 2005 Home Page Baltimore, MD - May 21-24, 2005 . The 37th ACM Symposium on Theory of Computing (STOC 2005), sponsored by the ACM ...

### Mietwagen Stockholm Östermalm | Sixt Autovermietung

Sixt-Station Stockholm Östermalm STOC05. Linnégatan 88: 115 23 Stockholm : Telefon +46-8-41077221 : Fax +46-8-6112215 :

### Mietwagen Stockholm Östermalm günstig - Sixt Autovermietung

Mietwagen Stockholm Östermalm - Top-Service 24/7 Mehrfach Testsieger Top-Preise Günstige ANGEBOTE Sixt Autovermietung

### Polynomial Time Quantum Algorithm for the Computation of ...

Polynomial Time Quantum Algorithm for the Computation of the Unit Group of a Number Field (Extended Abstract) Arthur Schmidt aschmidt@cdc.informatik.tu-

### Car Rental in Stockholm Östermalm - Sixt rent a car

Welcome to Stockholm Östermalm Sixt rent a car. ... SIXT # Stockholm Östermalm STOC05. Linnégatan 88: 115 23 Stockholm : Phone +46-8-41077221 : Fax +46 ...

### Car Hire Stockholm Östermalm: Sixt rent a car

Cheap Car Hire Stockholm Östermalm from £16/Day Prepay & Save Up to 20% Economy Cars Luxury Vehicles Vans