TY - GEN
T1 - MRPB
T2 - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
AU - Jia, Wenhao
AU - Shaw, Kelly A.
AU - Martonosi, Margaret Rose
PY - 2014
Y1 - 2014
N2 - Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods - request reordering and cache bypassing - to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65× and 1.27× respectively, outperforming a state-of-the-art GPU cache management technique.
AB - Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods - request reordering and cache bypassing - to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65× and 1.27× respectively, outperforming a state-of-the-art GPU cache management technique.
UR - http://www.scopus.com/inward/record.url?scp=84903985058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903985058&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2014.6835938
DO - 10.1109/HPCA.2014.6835938
M3 - Conference contribution
AN - SCOPUS:84903985058
SN - 9781479930975
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 272
EP - 283
BT - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
PB - IEEE Computer Society
Y2 - 15 February 2014 through 19 February 2014
ER -