Skip to main navigation Skip to search Skip to main content

Automated Multidimensional Data Layouts in Amazon Redshift

  • Jialin Ding
  • , Matt Abrams
  • , Sanghita Bandyopadhyay
  • , Luciano Di Palma
  • , Yanzhu Ji
  • , Davide Pagano
  • , Gopal Paliwal
  • , Panos Parchas
  • , Pascal Pfeil
  • , Orestis Polychroniou
  • , Gaurav Saxena
  • , Aamer Shah
  • , Amina Voloder
  • , Sherry Xiao
  • , Davis Zhang
  • , Tim Kraska

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Analytic data systems typically use data layouts to improve the performance of scanning and filtering data. Common data layout techniques include single-column sort keys, compound sort keys, and more complex multidimensional data layouts such as the Z-order. An appropriately-selected data layout over a table, in combination with metadata such as zone maps, enables the system to skip irrelevant data blocks when scanning the table, which reduces the amount of data scanned and improves query performance. In this paper, we introduce Multidimensional Data Layouts (MDDL), a new data layout technique which outperforms existing data layout techniques for query workloads with repetitive scan filters. Unlike existing data layout approaches, which typically sort tables based on columns, MDDL sorts tables based on a collection of predicates, which enables a much higher degree of specialization to the user's workload. We additionally introduce an algorithm for automatically learning the best MDDL for each table based on telemetry collected from the historical workload. We implemented MDDL within Amazon Redshift. Benchmarks on internal datasets and workloads show that MDDL achieves up to 85% reduction in end-to-end workload runtime compared to using traditional column-based data layout techniques. MDDL is, to the best of our knowledge, the first data layout technique in a commercial product that sorts based on predicates and automatically learns the best predicates.

Original languageEnglish (US)
Title of host publicationSIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data
PublisherAssociation for Computing Machinery
Pages55-67
Number of pages13
ISBN (Electronic)9798400704222
DOIs
StatePublished - Jun 9 2024
Externally publishedYes
Event2024 International Conference on Management of Data, SIGMOD 2024 - Santiago, Chile
Duration: Jun 9 2024Jun 15 2024

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2024 International Conference on Management of Data, SIGMOD 2024
Country/TerritoryChile
CitySantiago
Period6/9/246/15/24

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • analytic database
  • data warehouse
  • machine learning
  • sort key

Fingerprint

Dive into the research topics of 'Automated Multidimensional Data Layouts in Amazon Redshift'. Together they form a unique fingerprint.

Cite this