TY - GEN
T1 - TCP
T2 - 9th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2003
AU - Hu, Z.
AU - Martonosi, Margaret Rose
AU - Kaxiras, S.
PY - 2003/1/1
Y1 - 2003/1/1
N2 - Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recently, sophisticated hardware correlating prefetching mechanisms have been proposed, in some cases coupled with some form of dead-block prediction. In many proposals, however correlating prefetchers demand a significant investment in hardware. In this paper we show that correlating prefetchers that work with tags instead of cache-line addresses are significantly more resource-efficient, providing equal or better performance than previous proposals. We support this claim by showing that per-set tag sequences exhibit highly repetitive patterns both within a set and across different sets. Because a single tag sequence can capture multiple address sequences spread over different cache sets, significant space savings can be achieved. We propose a tag-based prefetcher called a tag correlating prefetcher (TCP). Even with very small history tables, TCP outperforms address-based correlating prefetchers many times larger. In addition, we show that such a prefetcher can yield most of its performance benefits if placed at the L2 level of an aggressive out-of-order processor. Only if one wants prefetching all the way up to L1, is dead-block prediction required. Finally, we draw parallels between the two-level structure of TCP and similar structures for branch prediction mechanisms; these parallels raise interesting opportunities for improving correlating memory prefetchers by harnessing lessons already learned for correlating branch predictors.
AB - Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recently, sophisticated hardware correlating prefetching mechanisms have been proposed, in some cases coupled with some form of dead-block prediction. In many proposals, however correlating prefetchers demand a significant investment in hardware. In this paper we show that correlating prefetchers that work with tags instead of cache-line addresses are significantly more resource-efficient, providing equal or better performance than previous proposals. We support this claim by showing that per-set tag sequences exhibit highly repetitive patterns both within a set and across different sets. Because a single tag sequence can capture multiple address sequences spread over different cache sets, significant space savings can be achieved. We propose a tag-based prefetcher called a tag correlating prefetcher (TCP). Even with very small history tables, TCP outperforms address-based correlating prefetchers many times larger. In addition, we show that such a prefetcher can yield most of its performance benefits if placed at the L2 level of an aggressive out-of-order processor. Only if one wants prefetching all the way up to L1, is dead-block prediction required. Finally, we draw parallels between the two-level structure of TCP and similar structures for branch prediction mechanisms; these parallels raise interesting opportunities for improving correlating memory prefetchers by harnessing lessons already learned for correlating branch predictors.
UR - http://www.scopus.com/inward/record.url?scp=79951763686&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951763686&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2003.1183549
DO - 10.1109/HPCA.2003.1183549
M3 - Conference contribution
AN - SCOPUS:79951763686
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 317
EP - 326
BT - Proceedings - 9th International Symposium on High-Performance Computer Architecture, HPCA 2003
PB - IEEE Computer Society
Y2 - 8 February 2003 through 12 February 2003
ER -