TY - GEN
T1 - Automatic instruction-level software-only recovery
AU - Chang, Jonathan
AU - Reis, George A.
AU - August, David I.
PY - 2006
Y1 - 2006
N2 - As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Computer architects have typically addressed reliability issues by adding redundant hardware, but these techniques are often too expensive to be used widely. Software-only reliability techniques have shown promise in their ability to protect against soft-errors without any hardware overhead. However, existing low-level software-only fault tolerance techniques have only addressed the problem of detecting faults, leaving recovery largely unaddressed. In this paper, we present the concept, implementation, and evaluation of automatic, instruction-level, software-only recovery techniques, as well as various specific techniques representing different trade-offs between reliability and performance. Our evaluation shows that these techniques fulfill the promises of instruction-level, software-only fault tolerance by offering a wide range of flexible recovery options.
AB - As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Computer architects have typically addressed reliability issues by adding redundant hardware, but these techniques are often too expensive to be used widely. Software-only reliability techniques have shown promise in their ability to protect against soft-errors without any hardware overhead. However, existing low-level software-only fault tolerance techniques have only addressed the problem of detecting faults, leaving recovery largely unaddressed. In this paper, we present the concept, implementation, and evaluation of automatic, instruction-level, software-only recovery techniques, as well as various specific techniques representing different trade-offs between reliability and performance. Our evaluation shows that these techniques fulfill the promises of instruction-level, software-only fault tolerance by offering a wide range of flexible recovery options.
UR - http://www.scopus.com/inward/record.url?scp=33750415121&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750415121&partnerID=8YFLogxK
U2 - 10.1109/DSN.2006.15
DO - 10.1109/DSN.2006.15
M3 - Conference contribution
AN - SCOPUS:33750415121
SN - 0769526071
SN - 9780769526072
T3 - Proceedings of the International Conference on Dependable Systems and Networks
SP - 83
EP - 92
BT - Proceedings - DSN 2006
T2 - DSN 2006: 2006 International Conference on Dependable Systems and Networks
Y2 - 25 June 2006 through 28 June 2006
ER -