Software-Controlled Fault Tolerance

George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, Shubhendu S. Mukherjee

Research output: Contribution to journalArticle

86 Scopus citations

Abstract

Traditional fault-tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of a system. This paper proposes software-controlled fault tolerance, a concept allowing designers and users to tailor their performance and reliability for each situation. Several software-controllable fault-detection techniques are then presented: SWIFT, a software-only technique, and CRAFT, a suite of hybrid hardware/ software techniques. Finally, the paper introduces PROFiT, a technique which adjusts the level of protection and performance at fine granularities through software control. When coupled with software-controllable techniques like SWIFT and CRAFT, PROFiT offers attractive and novel reliability options.

Original languageEnglish (US)
Pages (from-to)366-396
Number of pages31
JournalACM Transactions on Architecture and Code Optimization
Volume2
Issue number4
DOIs
StatePublished - Jan 1 2005

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Fault detection
  • Reliability
  • Reliability
  • Software-controlled fault tolerance

Fingerprint Dive into the research topics of 'Software-Controlled Fault Tolerance'. Together they form a unique fingerprint.

  • Cite this

    Reis, G. A., Chang, J., Vachharajani, N., Rangan, R., August, D. I., & Mukherjee, S. S. (2005). Software-Controlled Fault Tolerance. ACM Transactions on Architecture and Code Optimization, 2(4), 366-396. https://doi.org/10.1145/1113841.1113843