Real-time, concurrent checkpoint for parallel programs

Kai Li, Jeffrey F. Naughton, James S. Plank

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

We have developed and implemented a checkpointing and restart algorithm for parallel programs running on commercial uniprocessors and shared-memory multipro cessors. The algorithm runs concurrently with the target program, interrupts the target program for small, fixed amounts of time and is transparent to the checkpointed program and its compiler. The algorithm achieves its efficiency through a novel use of address translation hardware that allows the most time-consuming operations of the checkpoint to be overlapped with the running of the program being checkpointed.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1990
PublisherAssociation for Computing Machinery
Pages79-88
Number of pages10
ISBN (Electronic)0897913507
DOIs
StatePublished - Feb 1 1990
Event2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1990 - Seattle, United States
Duration: Mar 14 1990Mar 16 1990

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
VolumePart F130005

Other

Other2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1990
CountryUnited States
CitySeattle
Period3/14/903/16/90

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Real-time, concurrent checkpoint for parallel programs'. Together they form a unique fingerprint.

  • Cite this

    Li, K., Naughton, J. F., & Plank, J. S. (1990). Real-time, concurrent checkpoint for parallel programs. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1990 (pp. 79-88). (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP; Vol. Part F130005). Association for Computing Machinery. https://doi.org/10.1145/99163.99173