Software-Controlled Fault Tolerance

George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, Shubhendu S. Mukherjee

Research output: Contribution to journalArticlepeer-review

98 Scopus citations

Abstract

Traditional fault-tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of a system. This paper proposes software-controlled fault tolerance, a concept allowing designers and users to tailor their performance and reliability for each situation. Several software-controllable fault-detection techniques are then presented: SWIFT, a software-only technique, and CRAFT, a suite of hybrid hardware/ software techniques. Finally, the paper introduces PROFiT, a technique which adjusts the level of protection and performance at fine granularities through software control. When coupled with software-controllable techniques like SWIFT and CRAFT, PROFiT offers attractive and novel reliability options.

Original languageEnglish (US)
Pages (from-to)366-396
Number of pages31
JournalACM Transactions on Architecture and Code Optimization
Volume2
Issue number4
DOIs
StatePublished - Jan 1 2005

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Fault detection
  • Reliability
  • Reliability
  • Software-controlled fault tolerance

Fingerprint

Dive into the research topics of 'Software-Controlled Fault Tolerance'. Together they form a unique fingerprint.

Cite this