Estimating arbitrary subset sums with few probes

Noga Alon, Nick Duffield, Carsten Lund, Mikkel Thorup

Research output: Contribution to conferencePaperpeer-review

29 Scopus citations

Abstract

Suppose we have a large table T of items i, each with a weight w i, e.g., people and their salary. In a general preprocessing step for estimating arbitrary subset sums, we assign each item a random priority depending on its weight. Suppose we want to estimate the sum of an arbitrary subset I ⊆ T. For any q > 2, considering only the q highest priority items from I, we obtain an unbiased estimator of the sum whose relative standard deviation is O(1/√q). Thus to get an expected approximation factor of 1 ± ε, it suffices to consider O(1/ε2) items from I. Our estimator needs no knowledge of the number of items in the subset I, but we can also estimate that number if we want to estimate averages. The above scheme performs the same role as the on-line aggregation of Hellerstein et al. (SIGMOD'97) but it has the advantage of having expected good performance for any possible sequence of weights. In particular, the performance does not deteriorate in the common case of heavy-tailed weight distributions. This point is illustrated experimentally both with real and synthetic data. We will also show that our approach can be used to improve Cohen's size estimation framework (FOCS'94).

Original languageEnglish (US)
Pages317-325
Number of pages9
DOIs
StatePublished - 2005
Externally publishedYes
EventTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 - Baltimore, MD, United States
Duration: Jun 13 2005Jun 15 2005

Other

OtherTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005
Country/TerritoryUnited States
CityBaltimore, MD
Period6/13/056/15/05

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Databases
  • IP Flows
  • Preprocessing
  • Sampling

Fingerprint

Dive into the research topics of 'Estimating arbitrary subset sums with few probes'. Together they form a unique fingerprint.

Cite this