Performance monitoring in a Myrinet-connected shrimp cluster

Cheng Liao, Margaret Rose Martonosi, Douglas W. Clark

Research output: Contribution to conferencePaperpeer-review

16 Scopus citations

Abstract

Performance monitoring is a crucial aspect of parallel programming. Extracting the best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. A common tradeoff arises in determining at which system level to monitor performance information and present results. High-level monitoring approaches can often gather data directly tied to the software programming model, but may abstract away crucial low-level hardware details. Low-level monitoring approaches can gather fairly complete performance information about the underlying system, but often at the expense of portability and flexibility. In this paper we discuss a compromise approach between the portability and flexibility of high-level monitoring and the detailed data awareness of low-level monitoring. We present a firmware-based performance monitor we designed for a Myrinet-connected Shrimp cluster. This monitor combines the portability and flexibility typically found in software-based monitors, with detailed, low-level information traditionally available only to hardware monitors. As with hardware approaches, ours results in little monitoring perturbation. Since it includes a software-based global clock, the monitor can track inter-node latencies accurately. Our tool is flexible and can monitor applications with a wide range of communication abstractions, though we focus here on its usage on shared virtual memory applications. The portability and flexibility of this firmware-based monitoring strategy make it a very promising approach for gathering low-level statistics about parallel program performance.

Original languageEnglish (US)
Pages21-29
Number of pages9
DOIs
StatePublished - 1998
EventProceedings of the 1998 SIGMETRICS Symposium on Parallel and Distributed Tools - Welches, OR, USA
Duration: Aug 3 1998Aug 4 1998

Other

OtherProceedings of the 1998 SIGMETRICS Symposium on Parallel and Distributed Tools
CityWelches, OR, USA
Period8/3/988/4/98

All Science Journal Classification (ASJC) codes

  • General Engineering
  • General Computer Science

Fingerprint

Dive into the research topics of 'Performance monitoring in a Myrinet-connected shrimp cluster'. Together they form a unique fingerprint.

Cite this