TY - GEN
T1 - SETSUDO
T2 - 1st ACM SIGOPS Conference on Timely Results in Operating Systems, TRIOS 2013
AU - Joshi, Pallavi
AU - Ganai, Malay
AU - Balakrishnan, Gogul
AU - Gupta, Aarti
AU - Papakonstantinou, Nadia
N1 - Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/11/3
Y1 - 2013/11/3
N2 - Modern scalable distributed systems are designed to be partition-tolerant. They are often required to support increasing load in service requests elastically, and to provide seamless services even when some servers malfunction. Partition-tolerance enables such systems to withstand arbitrary loss of messages as "perceived" by the communicating nodes. However, partition-tolerance and robustness are not tested rigorously in practice. Often severe system-level design defects stay hidden even after deployment, possibly resulting in loss of revenue or customer satisfaction. We propose a novel perturbation-based rigorous testing framework, named SETSUDO 1, especially targeted to expose system-level defects in scalable distributed systems. It applies perturbations (i.e., controlled changes) from the environment of a system during testing, and leverages awareness of system-internal states to precisely control their timing. It uses a flexible instrumentation framework to select relevant internal states and to implement the system code for perturbations. It also provides a test policy language framework, where sequences of perturbation scenarios at a high level are converted automatically to system-level test code. This test code is weaved-in automatically with application code during testing, and any observed defects are reported. We have implemented our perturbation testing framework and demonstrate its evaluation on several open source projects, where it was successful in exposing known, as well as some unknown, defects. Our framework leverages small-scale testing, and avoids upfront infrastructure costs typically needed for large-scale stress testing.
AB - Modern scalable distributed systems are designed to be partition-tolerant. They are often required to support increasing load in service requests elastically, and to provide seamless services even when some servers malfunction. Partition-tolerance enables such systems to withstand arbitrary loss of messages as "perceived" by the communicating nodes. However, partition-tolerance and robustness are not tested rigorously in practice. Often severe system-level design defects stay hidden even after deployment, possibly resulting in loss of revenue or customer satisfaction. We propose a novel perturbation-based rigorous testing framework, named SETSUDO 1, especially targeted to expose system-level defects in scalable distributed systems. It applies perturbations (i.e., controlled changes) from the environment of a system during testing, and leverages awareness of system-internal states to precisely control their timing. It uses a flexible instrumentation framework to select relevant internal states and to implement the system code for perturbations. It also provides a test policy language framework, where sequences of perturbation scenarios at a high level are converted automatically to system-level test code. This test code is weaved-in automatically with application code during testing, and any observed defects are reported. We have implemented our perturbation testing framework and demonstrate its evaluation on several open source projects, where it was successful in exposing known, as well as some unknown, defects. Our framework leverages small-scale testing, and avoids upfront infrastructure costs typically needed for large-scale stress testing.
UR - http://www.scopus.com/inward/record.url?scp=84958259499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958259499&partnerID=8YFLogxK
U2 - 10.1145/2524211.2524217
DO - 10.1145/2524211.2524217
M3 - Conference contribution
AN - SCOPUS:84958259499
T3 - Proceedings of the 1st ACM SIGOPS Conference on Timely Results in Operating Systems, TRIOS 2013
BT - Proceedings of the 1st ACM SIGOPS Conference on Timely Results in Operating Systems, TRIOS 2013
PB - Association for Computing Machinery, Inc
Y2 - 3 November 2013 through 6 November 2013
ER -