Abstract
We propose a new self-organizing, self-optimizing, meta-data rich storage format for the cloud, called a self-organizing data container (SDC), that enables order-of-magnitude performance improvements in data-intensive applications through instance-optimization, i.e., the adaptation of data representation to exploit both the distribution of the data and the workload operating on it. Unlike existing low-level cloud storage formats like Apache Arrow and Parquet, SDCs capture both data and metadata, like access histories and distributional statistics, and are designed to be flexible enough to encompass a variety of modern high-performance representations for data analytics, including partitioning, replication, indexing, and materialization. We present a preliminary design for SDCs, some motivating experiments, and discuss new challenges they present.
| Original language | English (US) |
|---|---|
| State | Published - 2022 |
| Externally published | Yes |
| Event | 12th Annual Conference on Innovative Data Systems Research, CIDR 2022 - Santa Cruz, United States Duration: Jan 9 2022 → Jan 12 2022 |
Conference
| Conference | 12th Annual Conference on Innovative Data Systems Research, CIDR 2022 |
|---|---|
| Country/Territory | United States |
| City | Santa Cruz |
| Period | 1/9/22 → 1/12/22 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Hardware and Architecture
- Information Systems
- Information Systems and Management