Abstract
Performance monitoring is a crucial aspect of parallel programming. Extracting the best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. A common tradeoff arises in determining at which system level to monitor performance information and present results. High-level monitoring approaches can often gather data directly tied to the software programming model, but may abstract away crucial low-level hardware details. Low-level monitoring approaches can gather fairly complete performance information about the underlying system, but often at the expense of portability and flexibility. In this paper we discuss a compromise approach between the portability and flexibility of high-level monitoring and the detailed data awareness of low-level monitoring. We present a firmware-based performance monitor we designed for a Myrinet-connected Shrimp cluster. This monitor combines the portability and flexibility typically found in software-based monitors, with detailed, low-level information traditionally available only to hardware monitors. As with hardware approaches, ours results in little monitoring perturbation. Since it includes a software-based global clock, the monitor can track inter-node latencies accurately. Our tool is flexible and can monitor applications with a wide range of communication abstractions, though we focus here on its usage on shared virtual memory applications. The portability and flexibility of this firmware-based monitoring strategy make it a very promising approach for gathering low-level statistics about parallel program performance.
Original language | English (US) |
---|---|
Pages | 21-29 |
Number of pages | 9 |
DOIs | |
State | Published - 1998 |
Event | Proceedings of the 1998 SIGMETRICS Symposium on Parallel and Distributed Tools - Welches, OR, USA Duration: Aug 3 1998 → Aug 4 1998 |
Other
Other | Proceedings of the 1998 SIGMETRICS Symposium on Parallel and Distributed Tools |
---|---|
City | Welches, OR, USA |
Period | 8/3/98 → 8/4/98 |
All Science Journal Classification (ASJC) codes
- General Engineering
- General Computer Science