MacroBase: Prioritizing atention in fast data

Firas Abuzaid, Peter Bailis, Jialin Ding, Edward Gan, Samuel Madden, Mit Csail Deepak Narayanan, Kexin Rong, Sahaana Suri

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.

Original languageEnglish (US)
Article number15
JournalACM Transactions on Database Systems
Volume43
Issue number4
DOIs
StatePublished - Dec 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems

Keywords

  • Analytics
  • Database
  • Streaming

Fingerprint

Dive into the research topics of 'MacroBase: Prioritizing atention in fast data'. Together they form a unique fingerprint.

Cite this