TY - GEN
T1 - ReproLite
T2 - 5th ACM Symposium on Cloud Computing, SOCC 2014
AU - Li, Kaituo
AU - Joshi, Pallavi
AU - Gupta, Aarti
AU - Ganai, Malay K.
N1 - Publisher Copyright:
Copyright © 2014 by the Association for Computing Machinery, Inc. (ACM).
PY - 2014/11/3
Y1 - 2014/11/3
N2 - Cloud systems have become ubiquitous today - they are used to store and process the tremendous amounts of data being generated by Internet users. These systems run on hundreds of commodity machines, and have a huge amount of non-determinism (thousands of threads and hundreds of processes) in their execution. Therefore, bugs that occur in cloud systems are hard to understand, reproduce, and fix. The state-of-the-art of debugging in the industry is to log messages during execution, and refer to those messages later in case of errors. In ReproLite, we augment the already widespread process of debugging using logs by enabling testers to quickly and easily specify the conjectures that they form regarding the cause of an error (or bug) from execution logs, and to also automatically validate those conjectures. ReproLite includes a Domain Specific Language (DSL) that allows testers to specify all aspects of a potential scenario (e.g., specific workloads, execution operations and their orders, environment non-determinism) that causes a given bug. Given such a scenario, ReproLite can enforce the conditions in the scenario during system execution. Potential buggy scenarios can also be automatically generated from a sequence of log messages that a tester believes indicates the cause of the bug. We have experimented ReproLite with 11 bugs from two popular cloud systems, Cassandra and HBase. We were able to reproduce all of the bugs using ReproLite. We report on our experience with using ReproLite on those bugs.
AB - Cloud systems have become ubiquitous today - they are used to store and process the tremendous amounts of data being generated by Internet users. These systems run on hundreds of commodity machines, and have a huge amount of non-determinism (thousands of threads and hundreds of processes) in their execution. Therefore, bugs that occur in cloud systems are hard to understand, reproduce, and fix. The state-of-the-art of debugging in the industry is to log messages during execution, and refer to those messages later in case of errors. In ReproLite, we augment the already widespread process of debugging using logs by enabling testers to quickly and easily specify the conjectures that they form regarding the cause of an error (or bug) from execution logs, and to also automatically validate those conjectures. ReproLite includes a Domain Specific Language (DSL) that allows testers to specify all aspects of a potential scenario (e.g., specific workloads, execution operations and their orders, environment non-determinism) that causes a given bug. Given such a scenario, ReproLite can enforce the conditions in the scenario during system execution. Potential buggy scenarios can also be automatically generated from a sequence of log messages that a tester believes indicates the cause of the bug. We have experimented ReproLite with 11 bugs from two popular cloud systems, Cassandra and HBase. We were able to reproduce all of the bugs using ReproLite. We report on our experience with using ReproLite on those bugs.
KW - Cloud computing
KW - Debugging
KW - Hard system bug
KW - Lightweight
UR - http://www.scopus.com/inward/record.url?scp=85118316220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118316220&partnerID=8YFLogxK
U2 - 10.1145/2670979.2671004
DO - 10.1145/2670979.2671004
M3 - Conference contribution
AN - SCOPUS:85118316220
T3 - Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014
BT - Proceedings of the 5th ACM Symposium on Cloud Computing, SOCC 2014
PB - Association for Computing Machinery
Y2 - 3 November 2014 through 5 November 2014
ER -