Characterizing and improving the use of demand-fetched caches in GPUs

Wenhao Jia, Kelly A. Shaw, Margaret Rose Martonosi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

95 Scopus citations

Abstract

Initially introduced as special-purpose accelerators for games and graphics code, graphics processing units (GPUs) have emerged as widely-used high-performance parallel computing platforms. GPUs traditionally provided only software-managed local memories (or scratchpads) instead of demand-fetched caches. Increasingly, however, GPUs are being used in broader application domains where memory access patterns are both harder to analyze and harder to manage in software-controlled caches. In response, GPU vendors have included sizable demand-fetched caches in recent chip designs. Nonetheless, several problems remain. First, since these hardware caches are quite new and highly-configurable, it can be difficult to know when and how to use them; they sometimes degrade performance instead of improving it. Second, since GPU programming is quite distinct from general-purpose programming, application programmers do not yet have solid intuition about which memory reference patterns are amenable to demand-fetched caches. In response, this paper characterizes application performance on GPUs with caches and provides a taxonomy for reasoning about different types of access patterns and locality. Based on this taxonomy, we present an algorithm which can be automated and applied at compile-time to identify an application's memory access patterns and to use that information to intelligently configure cache usage to improve application performance. Experiments on real GPU systems show that our algorithm reliably predicts when GPU caches will help or hurt performance. Compared to always passively turning caches on, our method can increase the average benefit of caches from 5.8% to 18.0% for applications that have significant performance sensitivity to caching.

Original languageEnglish (US)
Title of host publicationICS'12 - Proceedings of the 2012 ACM International Conference on Supercomputing
Pages15-24
Number of pages10
DOIs
StatePublished - Jul 25 2012
Event26th ACM International Conference on Supercomputing, ICS'12 - San Servolo Island, Venice, Italy
Duration: Jun 25 2012Jun 29 2012

Publication series

NameProceedings of the International Conference on Supercomputing

Other

Other26th ACM International Conference on Supercomputing, ICS'12
CountryItaly
CitySan Servolo Island, Venice
Period6/25/126/29/12

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Characterizing and improving the use of demand-fetched caches in GPUs'. Together they form a unique fingerprint.

  • Cite this

    Jia, W., Shaw, K. A., & Martonosi, M. R. (2012). Characterizing and improving the use of demand-fetched caches in GPUs. In ICS'12 - Proceedings of the 2012 ACM International Conference on Supercomputing (pp. 15-24). (Proceedings of the International Conference on Supercomputing). https://doi.org/10.1145/2304576.2304582