Checkpointing multicomputer applications

Kai Li, Jeffrey F. Naughton, James S. Plank

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Scopus citations

Abstract

The authors present a checkpointing scheme that is transparent, imposes overhead only during checkpoints, requires minimal message logging, and allows for quick resumption of execution from a checkpointed image. Since checkpointing multicomputer applications poses requirements different from those posed by checkpointing general distributed systems, existing distributed checkpointing schemes are inadequate for multicomputer checkpointing. The proposed checkpointing scheme makes use of special properties of multicomputer interconnection networks to satisfy this set of requirements. The proposed algorithm is efficient both when taking checkpoints and when recovering from checkpointed images.

Original languageEnglish (US)
Title of host publicationProceedings - Symposium on Reliability in Distributed Software and Database Systems
PublisherPubl by IEEE
Pages2-11
Number of pages10
ISBN (Print)0818622601
StatePublished - Oct 1 1991
EventProceedigs of the 10th Symposium on Reliable Distributed Systems - Pisa, Italy
Duration: Sep 30 1991Oct 2 1991

Publication series

NameProceedings - Symposium on Reliability in Distributed Software and Database Systems

Other

OtherProceedigs of the 10th Symposium on Reliable Distributed Systems
CityPisa, Italy
Period9/30/9110/2/91

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Checkpointing multicomputer applications'. Together they form a unique fingerprint.

  • Cite this

    Li, K., Naughton, J. F., & Plank, J. S. (1991). Checkpointing multicomputer applications. In Proceedings - Symposium on Reliability in Distributed Software and Database Systems (pp. 2-11). (Proceedings - Symposium on Reliability in Distributed Software and Database Systems). Publ by IEEE.