Skip to main navigation Skip to search Skip to main content

FlyMC: Highly scalable testing of complex interleavings in distributed systems

  • Jeffrey F. Lukman
  • , Huan Ke
  • , Cesar A. Stuardo
  • , Riza O. Suminto
  • , Daniar H. Kurniawan
  • , Dikaimin Simon
  • , Satria Priambada
  • , Chen Tian
  • , Feng Ye
  • , Tanakorn Leesatapornwongsa
  • , Aarti Gupta
  • , Shan Lu
  • , Haryadi S. Gunawi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a fast and scalable testing approach for datacenter/cloud systems such as Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability to overcome the path/state-space explosion problem in testing workloads with complex interleavings of messages and faults. We introduce three powerful algorithms: state symmetry, event independence, and parallel flips, which collectively makes our approach on average 16× (up to 78×) faster than other state-of-the-art solutions. We have integrated our techniques with 8 popular datacenter systems, successfully reproduced 12 old bugs, and found 10 new bugs — all were done without random walks or manual checkpoints.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th EuroSys Conference 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450362818
DOIs
StatePublished - Mar 25 2019
Event14th European Conference on Computer Systems, EuroSys 2019 - Dresden, Germany
Duration: Mar 25 2019Mar 28 2019

Publication series

NameProceedings of the 14th EuroSys Conference 2019

Conference

Conference14th European Conference on Computer Systems, EuroSys 2019
Country/TerritoryGermany
CityDresden
Period3/25/193/28/19

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Keywords

  • Availability
  • Distributed Concurrency Bugs
  • Distributed Systems
  • Reliability
  • Software Model Checking

Fingerprint

Dive into the research topics of 'FlyMC: Highly scalable testing of complex interleavings in distributed systems'. Together they form a unique fingerprint.

Cite this