Self-Organizing Data Containers

Samuel Madden, Jialin Ding, Tim Kraska, Sivaprasad Sudhir, David Cohen, Timothy Mattson, Nesime Tatbul

Research output: Contribution to conferencePaperpeer-review

4 Scopus citations

Abstract

We propose a new self-organizing, self-optimizing, meta-data rich storage format for the cloud, called a self-organizing data container (SDC), that enables order-of-magnitude performance improvements in data-intensive applications through instance-optimization, i.e., the adaptation of data representation to exploit both the distribution of the data and the workload operating on it. Unlike existing low-level cloud storage formats like Apache Arrow and Parquet, SDCs capture both data and metadata, like access histories and distributional statistics, and are designed to be flexible enough to encompass a variety of modern high-performance representations for data analytics, including partitioning, replication, indexing, and materialization. We present a preliminary design for SDCs, some motivating experiments, and discuss new challenges they present.

Original languageEnglish (US)
StatePublished - 2022
Externally publishedYes
Event12th Annual Conference on Innovative Data Systems Research, CIDR 2022 - Santa Cruz, United States
Duration: Jan 9 2022Jan 12 2022

Conference

Conference12th Annual Conference on Innovative Data Systems Research, CIDR 2022
Country/TerritoryUnited States
CitySanta Cruz
Period1/9/221/12/22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Self-Organizing Data Containers'. Together they form a unique fingerprint.

Cite this