Efficient diagnosis in algorithm-based fault tolerant multiprocessor systems

Santhanam Srinivasan, Niraj K. Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The conventional assumption that checks in an algorithm-based fault tolerant (ABFT) system can be invalidated due to aliasing of erroneous data elements has complicated the task of error detection and location. In this paper we show that aliasing is a very rare occurrence. We then present a simple polynomial-time diagnosis algorithm which takes advantage of this result to run much more efficiently compared to the conventional method of diagnosis. We introduce the concept of NC-detectability and NC-locatability to measure the fault tolerance of the system when check invalidation does not occur and show how to design systems with specified error detectability/NC-detectability and locatability/NC-locatability. For the data-check graphs designed using these methods, when aliasing does not occur, our diagnosis algorithm has a worst case complexity of O(s2n2 log n), where s is the error locatability and n is the number of data elements in the system. We also consider the case where the processors which compute the checks themselves fail.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Computer Design
Subtitle of host publicationVLSI in Computers and Processors
Editors Anon
PublisherPubl by IEEE
Pages592-595
Number of pages4
ISBN (Print)0818642300
StatePublished - 1993
EventProceedings of the 1993 IEEE International Conference on Computer Design: VLSI in Computers & Processors - Cambridge, MA, USA
Duration: Oct 3 1993Oct 6 1993

Publication series

NameProceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors

Other

OtherProceedings of the 1993 IEEE International Conference on Computer Design: VLSI in Computers & Processors
CityCambridge, MA, USA
Period10/3/9310/6/93

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Efficient diagnosis in algorithm-based fault tolerant multiprocessor systems'. Together they form a unique fingerprint.

Cite this