Moment based quantile sketches for efficient high cardinality aggregation queries

  • Edward Gan
  • , Jialin Ding
  • , Kai Sheng Tai
  • , Vatsal Sharan
  • , Peter Bailis

Research output: Contribution to journalConference articlepeer-review

40 Scopus citations

Abstract

Interactive analytics increasingly involves querying for quantiles over sub-populations of high cardinality datasets. Data processing engines such as Druid and Spark use mergeable summaries to estimate quantiles, but summary merge times can be a bottleneck during aggregation. We show how a compact and efficiently mergeable quantile sketch can support aggregation workloads. This data structure, which we refer to as the moments sketch, operates with a small memory footprint (200 bytes) and computationally efficient (50ns) merges by tracking only a set of summary statistics, notably the sample moments. We demonstrate how we can efficiently estimate quantiles using the method of moments and the maximum entropy principle, and show how the use of a cascade further improves query time for threshold predicates. Empirical evaluation shows that the moments sketch can achieve less than 1 percent quantile error with 15× less overhead than comparable summaries, improving end query time in the MacroBase engine by up to 7× and the Druid engine by up to 60×.

Original languageEnglish (US)
Pages (from-to)1647-1660
Number of pages14
JournalProceedings of the VLDB Endowment
Volume11
Issue number11
DOIs
StatePublished - 2018
Externally publishedYes
Event44th International Conference on Very Large Data Bases, VLDB 2018 - Rio de Janeiro, Brazil
Duration: Aug 27 2018Aug 31 2018

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Moment based quantile sketches for efficient high cardinality aggregation queries'. Together they form a unique fingerprint.

Cite this