ReproLite: A lightweight tool to quickly reproduce hard system bugs

Kaituo Li, Pallavi Joshi, Aarti Gupta, Malay K. Ganai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Cloud systems have become ubiquitous today - they are used to store and process the tremendous amounts of data being generated by Internet users. These systems run on hundreds of commodity machines, and have a huge amount of non-determinism (thousands of threads and hundreds of processes) in their execution. Therefore, bugs that occur in cloud systems are hard to understand, reproduce, and fix. The state-of-the-art of debugging in the industry is to log messages during execution, and refer to those messages later in case of errors. In ReproLite, we augment the already widespread process of debugging using logs by enabling testers to quickly and easily specify the conjectures that they form regarding the cause of an error (or bug) from execution logs, and to also automatically validate those conjectures. ReproLite includes a Domain Specific Language (DSL) that allows testers to specify all aspects of a potential scenario (e.g., specific workloads, execution operations and their orders, environment non-determinism) that causes a given bug. Given such a scenario, ReproLite can enforce the conditions in the scenario during system execution. Potential buggy scenarios can also be automatically generated from a sequence of log messages that a tester believes indicates the cause of the bug. We have experimented ReproLite with 11 bugs from two popular cloud systems, Cassandra and HBase. We were able to reproduce all of the bugs using ReproLite. We report on our experience with using ReproLite on those bugs.

Original languageEnglish (US)
Title of host publicationProceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014
PublisherAssociation for Computing Machinery
ISBN (Electronic)1595930361, 9781450332521
DOIs
StatePublished - Nov 3 2014
Externally publishedYes
Event5th ACM Symposium on Cloud Computing, SOCC 2014 - Seattle, United States
Duration: Nov 3 2014Nov 5 2014

Publication series

NameProceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014

Other

Other5th ACM Symposium on Cloud Computing, SOCC 2014
Country/TerritoryUnited States
CitySeattle
Period11/3/1411/5/14

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • Cloud computing
  • Debugging
  • Hard system bug
  • Lightweight

Fingerprint

Dive into the research topics of 'ReproLite: A lightweight tool to quickly reproduce hard system bugs'. Together they form a unique fingerprint.

Cite this