RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds

Zhe Huang, Bharath Balasubramanian, Michael Wang, Tian Lan, Mung Chiang, Danny H.K. Tsang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

We address the problem of scheduling jobs with utilities that depend solely upon their completion-times in a shared cloud that imposes considerable uncertainty on the jobs' runtime. However, it is very hard to estimate the jobs' runtime in a shared cloud where jobs are often delayed due to reasons such as slow I/O performance and variations in memory availability. Unlike prior works, we acknowledge that runtime estimates are often erroneous and instead shift the burden of robustness to the job scheduler. Specifically, we present a scheduling problem that jointly accounts for: (i) job utilities specified as functions of their completion-time, and (ii) uncertainty in the jobs' runtime. Our proposed solution to this problem achieves lexicographic max-min fairness among the job utilities. We implement this as a robust scheduler, named RUSH, for YARN in Hadoop. Our experiments, using real-world data sets, illustrate RUSH's efficacy when compared with other commonly used schedulers.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE 36th International Conference on Distributed Computing Systems, ICDCS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages242-251
Number of pages10
ISBN (Electronic)9781509014828
DOIs
StatePublished - Aug 8 2016
Event36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016 - Nara, Japan
Duration: Jun 27 2016Jun 30 2016

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2016-August

Other

Other36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016
CountryJapan
CityNara
Period6/27/166/30/16

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Keywords

  • Hadoop
  • Robust Scheduling
  • Runtime Estimation

Fingerprint Dive into the research topics of 'RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds'. Together they form a unique fingerprint.

  • Cite this

    Huang, Z., Balasubramanian, B., Wang, M., Lan, T., Chiang, M., & Tsang, D. H. K. (2016). RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds. In Proceedings - 2016 IEEE 36th International Conference on Distributed Computing Systems, ICDCS 2016 (pp. 242-251). [7536523] (Proceedings - International Conference on Distributed Computing Systems; Vol. 2016-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2016.95