TY - JOUR
T1 - Characterizing the memory behavior of compiler-parallelized applications
AU - Torrie, Evan
AU - Martonosi, Margaret
AU - Tseng, Chau We
AU - Hall, Mary W.
N1 - Funding Information:
This research was supported in part by ARPA contract DABT63-94-C-0054, as well as two National Science Foundation CISE postdoctoral fellowships in Experimental Science. Margaret Martonosi is partly supportecl through a National Science Foundation Career Award (CCR-9502516). In addition, we acknowledge the Jet Propulsion Laboratory for a portion of Mary Hall’s funding. Finally, inany of the simulations were run at Princeton using machines purchased through a National Science Foundation CISE Research Infrastructure grant.
PY - 1996
Y1 - 1996
N2 - Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from standard benchmark suites, we measure statistics such as speedups, synchronization and load imbalance, causes of cache misses, cache line utilization, data traffic, and memory costs. This exploration allows us to draw several conclusions. First, we find that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second, we show that when long (512 byte) cache lines are used, many of these applications suffer from false sharing and low cache line utilization. Third, we identify some of the common artifacts in compiler-parallelized codes that can lead to false sharing or other types of poor memory system performance, and we suggest methods for improving them. Overall, this study offers both an important snapshot of the behavior of applications compiled by state-of-the-art compilers, as well as an increased understanding of the interplay between cache line size, program granularity, and memory performance in moderate- scale multiprocessors.
AB - Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from standard benchmark suites, we measure statistics such as speedups, synchronization and load imbalance, causes of cache misses, cache line utilization, data traffic, and memory costs. This exploration allows us to draw several conclusions. First, we find that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second, we show that when long (512 byte) cache lines are used, many of these applications suffer from false sharing and low cache line utilization. Third, we identify some of the common artifacts in compiler-parallelized codes that can lead to false sharing or other types of poor memory system performance, and we suggest methods for improving them. Overall, this study offers both an important snapshot of the behavior of applications compiled by state-of-the-art compilers, as well as an increased understanding of the interplay between cache line size, program granularity, and memory performance in moderate- scale multiprocessors.
KW - Cache performance
KW - False and true sharing
KW - Memory hierarchies
KW - Parallelism granularity
KW - Parallelizing compilers
KW - Shared-memory multiprocessors
UR - http://www.scopus.com/inward/record.url?scp=0030403093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030403093&partnerID=8YFLogxK
U2 - 10.1109/71.553272
DO - 10.1109/71.553272
M3 - Article
AN - SCOPUS:0030403093
SN - 1045-9219
VL - 7
SP - 1224
EP - 1237
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 12
ER -