Design of Algorithm-Based Fault Tolerant Systems with In-System Checks

Shalini Yajnik, Niraj K. Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

To improve the reliability of computeintensive applications run on multiprocessor architec tures, fault tolerance is introduced into the system with on-line detection and location of faults. This can be achieved by a low-cost scheme, called Algorithm-based fault tolerance (ABFT), which encodes data at the system level and modifies the algorithm to operate on the encoded data. The resultant encoded output data is checked for correctness by some checks. In this pa per we present an extended model for representing and designing ABFT systems. The model takes into con sideration the processors evaluating the checks. We propose a design method which considers the proces sors computing the checks to be a part of the ABFT system and guarantees concurrent error detection even in the presence of faults in these processors, unlike most methods presented earlier.

Original languageEnglish (US)
Title of host publicationArchitecture
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages246-253
Number of pages8
ISBN (Electronic)0849389836
DOIs
StatePublished - 1993
Event1993 International Conference on Parallel Processing, ICPP 1993 - Syracuse, United States
Duration: Aug 16 1993Aug 20 1993

Publication series

NameProceedings of the International Conference on Parallel Processing
Volume1
ISSN (Print)0190-3918

Conference

Conference1993 International Conference on Parallel Processing, ICPP 1993
Country/TerritoryUnited States
CitySyracuse
Period8/16/938/20/93

All Science Journal Classification (ASJC) codes

  • Software
  • General Mathematics
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Design of Algorithm-Based Fault Tolerant Systems with In-System Checks'. Together they form a unique fingerprint.

Cite this