TY - GEN
T1 - Optimizing communication scheduling using dataflow semantics
AU - Soviani, Adrian
AU - Singh, Jaswinder Pal
PY - 2009
Y1 - 2009
N2 - We show how coarse grain dataflow semantics (CGD) applied to SPMD algorithms makes application development and design space exploration simpler compared to message passing, at the same time providing on par performance. CGD applications are specified as dependencies between computation modules and data distributions. Communication and synchronization are added automatically and optimized for specific architectures, relieving programmers of this task.Many high level algorithm changes are easy to implement in CGD by redefining data distributions. These include exposing communication overlap by decreasing task grain, and aggregating communication by replicating data and computation.We briefly present a coordination language with dataflow semantics that implements the CGD model. Our implementation currently supports MPI, SHMEM, and pthreads. Results on Altix 4700 show our optimized CGD FT is 27% faster than original NPB 2.3 MPI implementation, and optimized CGD stencil has a 41% advantage over handwritten MPI.
AB - We show how coarse grain dataflow semantics (CGD) applied to SPMD algorithms makes application development and design space exploration simpler compared to message passing, at the same time providing on par performance. CGD applications are specified as dependencies between computation modules and data distributions. Communication and synchronization are added automatically and optimized for specific architectures, relieving programmers of this task.Many high level algorithm changes are easy to implement in CGD by redefining data distributions. These include exposing communication overlap by decreasing task grain, and aggregating communication by replicating data and computation.We briefly present a coordination language with dataflow semantics that implements the CGD model. Our implementation currently supports MPI, SHMEM, and pthreads. Results on Altix 4700 show our optimized CGD FT is 27% faster than original NPB 2.3 MPI implementation, and optimized CGD stencil has a 41% advantage over handwritten MPI.
UR - http://www.scopus.com/inward/record.url?scp=77951470208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951470208&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2009.66
DO - 10.1109/ICPP.2009.66
M3 - Conference contribution
AN - SCOPUS:77951470208
SN - 9780769538020
T3 - Proceedings of the International Conference on Parallel Processing
SP - 301
EP - 308
BT - ICPP-2009 - The 38th International Conference on Parallel Processing
T2 - 38th International Conference on Parallel Processing, ICPP-2009
Y2 - 22 September 2009 through 25 September 2009
ER -