FlyMC: Highly scalable testing of complex interleavings in distributed systems

Jeffrey F. Lukman, Huan Ke, Cesar A. Stuardo, Riza O. Suminto, Daniar H. Kurniawan, Dikaimin Simon, Satria Priambada, Chen Tian, Feng Ye, Tanakorn Leesatapornwongsa, Aarti Gupta, Shan Lu, Haryadi S. Gunawi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We present a fast and scalable testing approach for datacenter/cloud systems such as Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability to overcome the path/state-space explosion problem in testing workloads with complex interleavings of messages and faults. We introduce three powerful algorithms: state symmetry, event independence, and parallel flips, which collectively makes our approach on average 16× (up to 78×) faster than other state-of-the-art solutions. We have integrated our techniques with 8 popular datacenter systems, successfully reproduced 12 old bugs, and found 10 new bugs — all were done without random walks or manual checkpoints.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th EuroSys Conference 2019
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450362818
DOIs
StatePublished - Mar 25 2019
Event14th European Conference on Computer Systems, EuroSys 2019 - Dresden, Germany
Duration: Mar 25 2019Mar 28 2019

Publication series

NameProceedings of the 14th EuroSys Conference 2019

Conference

Conference14th European Conference on Computer Systems, EuroSys 2019
CountryGermany
CityDresden
Period3/25/193/28/19

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Keywords

  • Availability
  • Distributed Concurrency Bugs
  • Distributed Systems
  • Reliability
  • Software Model Checking

Fingerprint Dive into the research topics of 'FlyMC: Highly scalable testing of complex interleavings in distributed systems'. Together they form a unique fingerprint.

  • Cite this

    Lukman, J. F., Ke, H., Stuardo, C. A., Suminto, R. O., Kurniawan, D. H., Simon, D., Priambada, S., Tian, C., Ye, F., Leesatapornwongsa, T., Gupta, A., Lu, S., & Gunawi, H. S. (2019). FlyMC: Highly scalable testing of complex interleavings in distributed systems. In Proceedings of the 14th EuroSys Conference 2019 [3303986] (Proceedings of the 14th EuroSys Conference 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3302424.3303986