TY - JOUR
T1 - Message passing and shared address space parallelism on an SMP cluster
AU - Shan, Hongzhang
AU - Singh, Jaswinder Pal
AU - Oliker, Leonid
AU - Biswas, Rupak
N1 - Funding Information:
The work of the first two authors was supported by NSF under grant number ESS-9806751 to Princeton University. The second author was also supported by PECASE and a Sloan Research Fellowship. The work of the third author was supported by the US Department of Energy under contract number DE-AC03-76SF00098.
PY - 2003/2
Y1 - 2003/2
N2 - Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
AB - Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
KW - Benchmark applications
KW - Distributed shared memory
KW - Message passing
KW - PC cluster
KW - Parallel performance
UR - http://www.scopus.com/inward/record.url?scp=0037303612&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037303612&partnerID=8YFLogxK
U2 - 10.1016/S0167-8191(02)00222-3
DO - 10.1016/S0167-8191(02)00222-3
M3 - Article
AN - SCOPUS:0037303612
SN - 0167-8191
VL - 29
SP - 167
EP - 186
JO - Parallel Computing
JF - Parallel Computing
IS - 2
ER -