TY - GEN
T1 - Faster checkpointing with N+1 parity
AU - Plank, James S.
AU - Kai, Li
PY - 1994
Y1 - 1994
N2 - This paper presents a way to perform fast, incremental checkpointing of multicomputers and distributed systems by using N + 1 parity. A basic algorithm is described that uses two extra processors for checkpointing and enables the system to tolerate any single processor failure. The algorithm's speed comes from a combination of N + 1 parity, extra physical memory, and virtual memory hardware so that checkpoints need not be written to disk. This eliminates the most time-consuming portion of checkpointing. The algorithm requires each application processor to allocate a fixed amount of extra memory for checkpointing. This amount may be set statically by the programmer, and need not be equal to the size of the processor's writable address space. This alleviates a major restriction of previous checkpointing algorithms using N + 1 parity [28]. Finally, we outline how to extend our algorithm to tolerate any m processor failures with the addition of 2m extra checkpointing processors.
AB - This paper presents a way to perform fast, incremental checkpointing of multicomputers and distributed systems by using N + 1 parity. A basic algorithm is described that uses two extra processors for checkpointing and enables the system to tolerate any single processor failure. The algorithm's speed comes from a combination of N + 1 parity, extra physical memory, and virtual memory hardware so that checkpoints need not be written to disk. This eliminates the most time-consuming portion of checkpointing. The algorithm requires each application processor to allocate a fixed amount of extra memory for checkpointing. This amount may be set statically by the programmer, and need not be equal to the size of the processor's writable address space. This alleviates a major restriction of previous checkpointing algorithms using N + 1 parity [28]. Finally, we outline how to extend our algorithm to tolerate any m processor failures with the addition of 2m extra checkpointing processors.
UR - http://www.scopus.com/inward/record.url?scp=0028060943&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0028060943&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0028060943
SN - 0818655224
T3 - Digest of Papers - International Symposium on Fault-Tolerant Computing
SP - 288
EP - 297
BT - Digest of Papers - International Symposium on Fault-Tolerant Computing
PB - Publ by IEEE
T2 - Proceedings of the 24th International Symposium on Fault-Tolerant Computing
Y2 - 15 June 1994 through 17 June 1994
ER -