Analysis and randomized design of algorithm-based fault tolerant multiprocessor systems under an extended model

Shalini Yajnik, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm-based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to the system through system level encoding. In this paper, we propose randomized construction techniques, under an extended model, for the design of ABFT systems with the required fault tolerance capability. The model considers failures in the processors performing the checking operations.

Original languageEnglish (US)
Pages (from-to)757-768
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume8
Issue number7
DOIs
StatePublished - 1997

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Keywords

  • Algorithm-based fault tolerance
  • Concurrent error detection
  • Concurrent fault location
  • Fault diagnosis
  • Randomized algorithms
  • Transient faults

Fingerprint

Dive into the research topics of 'Analysis and randomized design of algorithm-based fault tolerant multiprocessor systems under an extended model'. Together they form a unique fingerprint.

Cite this