TY - JOUR
T1 - Adaptive parallelism in compiler-parallelized code
AU - Hall, Mary W.
AU - Martonosi, Margaret Rose
PY - 1998/12/10
Y1 - 1998/12/10
N2 - As moderate-scale multiprocessors become widely used, we foresee an increased demand for effective compiler parallelization and efficient management of parallelism. While parallelizing compilers are achieving success at identifying parallelism, they are less adept at predetermining the degree of parallelism in different program phases. Thus, a compiler-parallelized application may execute on more processors than it can effectively use - a waste of computational resources that becomes more acute as the number of processors increases, particularly for systems used as multiprogrammed compute servers. This paper examines the dynamic parallelism behavior of multiprogrammed workloads using programs from the Specfp95 and NAS benchmark suites, automatically parallelized by the Stanford SUIF compiler. Our results demonstrate that even the programs with good overall speedups display wide variability in the number of processors each phase (or loop) can exploit. We propose and evaluate a run-time system mechanism that dynamically adjusts the number of processors used by a compiler-parallelized application, responding to observed performance during the program's execution. Programs can thus adapt processor usage as they run, responding both to poor parallelism within certain parts of their code, and also to heavy multiprogramming loads during the execution. This mechanism improves workload performance up to 33% over one-at-a-time runs of the Workload programs.
AB - As moderate-scale multiprocessors become widely used, we foresee an increased demand for effective compiler parallelization and efficient management of parallelism. While parallelizing compilers are achieving success at identifying parallelism, they are less adept at predetermining the degree of parallelism in different program phases. Thus, a compiler-parallelized application may execute on more processors than it can effectively use - a waste of computational resources that becomes more acute as the number of processors increases, particularly for systems used as multiprogrammed compute servers. This paper examines the dynamic parallelism behavior of multiprogrammed workloads using programs from the Specfp95 and NAS benchmark suites, automatically parallelized by the Stanford SUIF compiler. Our results demonstrate that even the programs with good overall speedups display wide variability in the number of processors each phase (or loop) can exploit. We propose and evaluate a run-time system mechanism that dynamically adjusts the number of processors used by a compiler-parallelized application, responding to observed performance during the program's execution. Programs can thus adapt processor usage as they run, responding both to poor parallelism within certain parts of their code, and also to heavy multiprogramming loads during the execution. This mechanism improves workload performance up to 33% over one-at-a-time runs of the Workload programs.
UR - http://www.scopus.com/inward/record.url?scp=0032276282&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032276282&partnerID=8YFLogxK
U2 - 10.1002/(SICI)1096-9128(19981210)10:14<1235::AID-CPE373>3.0.CO;2-Z
DO - 10.1002/(SICI)1096-9128(19981210)10:14<1235::AID-CPE373>3.0.CO;2-Z
M3 - Article
AN - SCOPUS:0032276282
SN - 1040-3108
VL - 10
SP - 1235
EP - 1250
JO - Concurrency Practice and Experience
JF - Concurrency Practice and Experience
IS - 14
ER -