Real-Time, Concurrent Checkpoint for Parallel Programs

Kai Li, Jeffrey F. Naughton, James S. Plank

Research output: Contribution to journalArticlepeer-review

45 Scopus citations

Abstract

We have developed and implemented a checkpointing and restart algorithm for parallel programs running on commercial uniprocessors and shared-memory multiprocessors. The algorithm runs concurrently with the target program, interrupts the target program for small, fixed amounts of time and is transparent to the checkpointed program and its compiler. The algorithm achieves its efficiency through a novel use of address translation hardware that allows the most time-consuming operations of the checkpoint to be overlapped with the running of the program being checkpointed.

Original languageEnglish (US)
Pages (from-to)79-88
Number of pages10
JournalSIGPLAN Notices (ACM Special Interest Group on Programming Languages)
Volume25
Issue number3
DOIs
StatePublished - Jan 2 1990

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Real-Time, Concurrent Checkpoint for Parallel Programs'. Together they form a unique fingerprint.

Cite this