The quest for scalable support of data-intensive workloads in distributed systems

Ioan Raicu, Ian T. Foster, Yong Zhao, Philip Little, Christopher M. Moretti, Amitabh Chaudhary, Douglas Thain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Scopus citations

Abstract

Data-intensive applications involving the analysis of large datasets often require large amounts of compute and storage resources, for which data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. To explore the feasibility of data diffusion, we offer both a theoretical and an empirical analysis. We define an abstract model for data diffusion, introduce new scheduling policies with heuristics to optimize real-world performance, and develop a competitive online cache eviction policy. We also offer many empirical experiments to explore the benefits of dynamically expanding and contracting resources based on load, to improve system responsiveness while keeping wasted resources small. We show performance improvements of one to two orders of magnitude across three diverse workloads when compared to the performance of parallel file systems with throughputs approaching 80 Gb/s on a modest cluster of 200 processors. We also compare data diffusion with a best model for active storage, contrasting the difference between a pull-model found in data diffusion and a push-model found in active storage.

Original languageEnglish (US)
Title of host publicationProc. 18th ACM International Symposium on High Performance Distributed Computing, HPDC 09, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conf., HPDC'09
Pages21-29
Number of pages9
DOIs
StatePublished - 2009
Externally publishedYes
Event18th ACM International Symposium on High Performance Distributed Computing, HPDC 09, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conference, HPDC'09 - Garching, Germany
Duration: Jun 11 2009Jun 13 2009

Publication series

NameProc. 18th ACM International Symposium on High Performance Distributed Computing, HPDC 09, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conf., HPDC'09

Conference

Conference18th ACM International Symposium on High Performance Distributed Computing, HPDC 09, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conference, HPDC'09
Country/TerritoryGermany
CityGarching
Period6/11/096/13/09

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Keywords

  • Data diffusion
  • Data management
  • Data-aware scheduling
  • Falkon

Fingerprint

Dive into the research topics of 'The quest for scalable support of data-intensive workloads in distributed systems'. Together they form a unique fingerprint.

Cite this