As moderate-scale multiprocessors become widely used, we foresee an increased demand for effective compiler parallelization and efficient management of parallelism. While parallelizing compilers are achieving success at identifying parallelism, they are less adept at predetermining the degree of parallelism in different program phases. Thus, a compiler-parallelized application may execute on more processors than it can effectively use - a waste of computational resources that becomes more acute as the number of processors increases, particularly for systems used as multiprogrammed compute servers. This paper examines the dynamic parallelism behavior of multiprogrammed workloads using programs from the Specfp95 and NAS benchmark suites, automatically parallelized by the Stanford SUIF compiler. Our results demonstrate that even the programs with good overall speedups display wide variability in the number of processors each phase (or loop) can exploit. We propose and evaluate a run-time system mechanism that dynamically adjusts the number of processors used by a compiler-parallelized application, responding to observed performance during the program's execution. Programs can thus adapt processor usage as they run, responding both to poor parallelism within certain parts of their code, and also to heavy multiprogramming loads during the execution. This mechanism improves workload performance up to 33% over one-at-a-time runs of the Workload programs.
|Original language||English (US)|
|Number of pages||16|
|Journal||Concurrency Practice and Experience|
|State||Published - Dec 10 1998|
All Science Journal Classification (ASJC) codes