TY - GEN
T1 - Efficient diagnosis in algorithm-based fault tolerant multiprocessor systems
AU - Srinivasan, Santhanam
AU - Jha, Niraj K.
PY - 1993
Y1 - 1993
N2 - The conventional assumption that checks in an algorithm-based fault tolerant (ABFT) system can be invalidated due to aliasing of erroneous data elements has complicated the task of error detection and location. In this paper we show that aliasing is a very rare occurrence. We then present a simple polynomial-time diagnosis algorithm which takes advantage of this result to run much more efficiently compared to the conventional method of diagnosis. We introduce the concept of NC-detectability and NC-locatability to measure the fault tolerance of the system when check invalidation does not occur and show how to design systems with specified error detectability/NC-detectability and locatability/NC-locatability. For the data-check graphs designed using these methods, when aliasing does not occur, our diagnosis algorithm has a worst case complexity of O(s2n2 log n), where s is the error locatability and n is the number of data elements in the system. We also consider the case where the processors which compute the checks themselves fail.
AB - The conventional assumption that checks in an algorithm-based fault tolerant (ABFT) system can be invalidated due to aliasing of erroneous data elements has complicated the task of error detection and location. In this paper we show that aliasing is a very rare occurrence. We then present a simple polynomial-time diagnosis algorithm which takes advantage of this result to run much more efficiently compared to the conventional method of diagnosis. We introduce the concept of NC-detectability and NC-locatability to measure the fault tolerance of the system when check invalidation does not occur and show how to design systems with specified error detectability/NC-detectability and locatability/NC-locatability. For the data-check graphs designed using these methods, when aliasing does not occur, our diagnosis algorithm has a worst case complexity of O(s2n2 log n), where s is the error locatability and n is the number of data elements in the system. We also consider the case where the processors which compute the checks themselves fail.
UR - http://www.scopus.com/inward/record.url?scp=0027803169&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027803169&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0027803169
SN - 0818642300
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 592
EP - 595
BT - Proceedings - IEEE International Conference on Computer Design
A2 - Anon, null
PB - Publ by IEEE
T2 - Proceedings of the 1993 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Y2 - 3 October 1993 through 6 October 1993
ER -