TY - GEN
T1 - The performance advantages of integrating block data transfer in cache-coherent multiprocessors
AU - Woo, Steven Cameron
AU - Singh, Jaswinder Pal
AU - Hennessy, John L.
N1 - Publisher Copyright:
© 1994 ACM.
PY - 1994/11/1
Y1 - 1994/11/1
N2 - Integrating support for block data transfer has become an important emphasis in recent cache-coherent shared address space multiprocessors. This paper examines the potential performance benefits of adding this support. A set of ambitious hardware mechanisms is used to study performance gains in five important scientific computations that appear to be good candidates for using block transfer. Our conclusion is that the benefits of block transfer are not substantial for hardware cachecoherent multiprocessors. The main reasons for this are (i) the relatively modest fraction of time applications spend in communication amenable to block transfer, (ii) the difficulty of finding enough independent computation to overlap with the communication latency that remains after block transfer, and (iii) long cache lines often capture many of the benefits of block transfer in efficient cache-coherent machines. In the cases where block transfer improves performance, prefetching can often provide comparable, if not superior, performance benefits. We also examine the impact of varying important communication parameters and processor speed on the effectiveness of block transfer, and comment on useful features that a block transfer facility should support for real applications.
AB - Integrating support for block data transfer has become an important emphasis in recent cache-coherent shared address space multiprocessors. This paper examines the potential performance benefits of adding this support. A set of ambitious hardware mechanisms is used to study performance gains in five important scientific computations that appear to be good candidates for using block transfer. Our conclusion is that the benefits of block transfer are not substantial for hardware cachecoherent multiprocessors. The main reasons for this are (i) the relatively modest fraction of time applications spend in communication amenable to block transfer, (ii) the difficulty of finding enough independent computation to overlap with the communication latency that remains after block transfer, and (iii) long cache lines often capture many of the benefits of block transfer in efficient cache-coherent machines. In the cases where block transfer improves performance, prefetching can often provide comparable, if not superior, performance benefits. We also examine the impact of varying important communication parameters and processor speed on the effectiveness of block transfer, and comment on useful features that a block transfer facility should support for real applications.
UR - http://www.scopus.com/inward/record.url?scp=84974712610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974712610&partnerID=8YFLogxK
U2 - 10.1145/195473.195547
DO - 10.1145/195473.195547
M3 - Conference contribution
AN - SCOPUS:84974712610
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 219
EP - 229
BT - Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 1994
PB - Association for Computing Machinery
T2 - 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 1994
Y2 - 4 October 1994 through 7 October 1994
ER -