Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems

Bapiraju Vinnakota, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Parallel processing architectures are now in common use for signal processing and other computation-intensive applications. These applications are characterized by high throughput and long processing periods. Such characteristics decrease the reliability of high-performance architectures. The erroneous data produced by faulty processors could have damaging consequences, particularly in critical real-time applications. It is therefore desirable that any erroneous data produced by the system be detected and located as quickly as possible. Algorithm-based fault tolerance (ABFT) is a low-cost system-level concurrent error detection and fault location scheme. We apply methods used in the analysis of multiprocessor systems employing system-level diagnosis to the analysis of ABFT systems. A new algorithm to analyze an ABFT system for its fault diagnosability is developed using these methods. Based on this work, a fault diagnosis algorithm is developed for ABFT systems. No such algorithm has been presented previously.

Original languageEnglish (US)
Pages (from-to)924-937
Number of pages14
JournalIEEE Transactions on Computers
Volume42
Issue number8
DOIs
StatePublished - Aug 1993

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Keywords

  • Algorithm-based fault tolerance
  • checksum encoding
  • concurrent error detection
  • concurrent fault diagnosis
  • fault diagnosability

Fingerprint

Dive into the research topics of 'Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems'. Together they form a unique fingerprint.

Cite this