Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis

Bapiraju Vinnakota, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Algorithm-based fault tolerance (ABFT) is a low-overhead system-level concurrent error detection and fault location scheme for multiprocessor systems. In this short note, we present new methods for the design of ABFT systems. Our design procedure is applicable to a wide range of systems in which processors share data elements. A feature of our design approach is that the type of checks to be used in the final system can be controlled by the system designer. We also present some new bounds on the number of checks needed in ABFT system design.

Original languageEnglish (US)
Pages (from-to)1099-1106
Number of pages8
JournalIEEE Transactions on Parallel and Distributed Systems
Volume5
Issue number10
DOIs
StatePublished - Oct 1994

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Keywords

  • Algorithm-based fault tolerance
  • concurrent error detection
  • fault detectability
  • fault diagnosability
  • system-level fault tolerance

Fingerprint

Dive into the research topics of 'Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis'. Together they form a unique fingerprint.

Cite this