TY - JOUR
T1 - Finding and exploiting parallelism in an ocean simulation program
T2 - Experience, results, and implications
AU - Singh, Jaswinder Pal
AU - Hennessy, John L.
N1 - Funding Information:
We thank Stephen Klotz for providing us with the sequential program. This work was supported by DARPA under Contract NOO014-87-K-0828.
PY - 1992/5
Y1 - 1992/5
N2 - How to code and compile large application programs for execution on parallel processors is perhaps the biggest challenge facing the widespread adoption of multiprocessing. To gain insight into this problem, an ocean simulation application was converted to a parallel version. The parallel program demonstrated near-linear speed-up on an Encore Multimax, a 16-processor bus-based shared-memory machine. Parallelizing an existing sequential application-not just a single loop or computational kernel-leads to interesting insights about what issues are significant in the process of finding and implementing parallelism, and what the major challenges are. Three levels of approach to the problem of finding parallelism-loop-level parallelization, program restructuring, and algorithm modification-were attempted, with widely varying results. Loop-level parallelization did not scale sufficiently. High-level restructuring was useful for much of the application, but obtaining an efficient parallel program required algorithm changes to one portion of it. Implementation issues for scalable performance, such as data locality and synchronization, are also discussed. The nature, requirements, and success of the various transformations lend insight into the design of parallelizing tools and parallel programming environments.
AB - How to code and compile large application programs for execution on parallel processors is perhaps the biggest challenge facing the widespread adoption of multiprocessing. To gain insight into this problem, an ocean simulation application was converted to a parallel version. The parallel program demonstrated near-linear speed-up on an Encore Multimax, a 16-processor bus-based shared-memory machine. Parallelizing an existing sequential application-not just a single loop or computational kernel-leads to interesting insights about what issues are significant in the process of finding and implementing parallelism, and what the major challenges are. Three levels of approach to the problem of finding parallelism-loop-level parallelization, program restructuring, and algorithm modification-were attempted, with widely varying results. Loop-level parallelization did not scale sufficiently. High-level restructuring was useful for much of the application, but obtaining an efficient parallel program required algorithm changes to one portion of it. Implementation issues for scalable performance, such as data locality and synchronization, are also discussed. The nature, requirements, and success of the various transformations lend insight into the design of parallelizing tools and parallel programming environments.
UR - http://www.scopus.com/inward/record.url?scp=0002909737&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0002909737&partnerID=8YFLogxK
U2 - 10.1016/0743-7315(92)90056-S
DO - 10.1016/0743-7315(92)90056-S
M3 - Article
AN - SCOPUS:0002909737
SN - 0743-7315
VL - 15
SP - 27
EP - 48
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 1
ER -