Characterizing and improving the use of demand-fetched caches in GPUs

Wenhao Jia, Kelly A. Shaw, Margaret Martonosi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

108 Scopus citations


Initially introduced as special-purpose accelerators for games and graphics code, graphics processing units (GPUs) have emerged as widely-used high-performance parallel computing platforms. GPUs traditionally provided only software-managed local memories (or scratchpads) instead of demand-fetched caches. Increasingly, however, GPUs are being used in broader application domains where memory access patterns are both harder to analyze and harder to manage in software-controlled caches. In response, GPU vendors have included sizable demand-fetched caches in recent chip designs. Nonetheless, several problems remain. First, since these hardware caches are quite new and highly-configurable, it can be difficult to know when and how to use them; they sometimes degrade performance instead of improving it. Second, since GPU programming is quite distinct from general-purpose programming, application programmers do not yet have solid intuition about which memory reference patterns are amenable to demand-fetched caches. In response, this paper characterizes application performance on GPUs with caches and provides a taxonomy for reasoning about different types of access patterns and locality. Based on this taxonomy, we present an algorithm which can be automated and applied at compile-time to identify an application's memory access patterns and to use that information to intelligently configure cache usage to improve application performance. Experiments on real GPU systems show that our algorithm reliably predicts when GPU caches will help or hurt performance. Compared to always passively turning caches on, our method can increase the average benefit of caches from 5.8% to 18.0% for applications that have significant performance sensitivity to caching.

Original languageEnglish (US)
Title of host publicationICS'12 - Proceedings of the 2012 ACM International Conference on Supercomputing
Number of pages10
StatePublished - 2012
Event26th ACM International Conference on Supercomputing, ICS'12 - San Servolo Island, Venice, Italy
Duration: Jun 25 2012Jun 29 2012

Publication series

NameProceedings of the International Conference on Supercomputing


Other26th ACM International Conference on Supercomputing, ICS'12
CitySan Servolo Island, Venice

All Science Journal Classification (ASJC) codes

  • General Computer Science


  • CUDA
  • Compiler optimization
  • GPU cache


Dive into the research topics of 'Characterizing and improving the use of demand-fetched caches in GPUs'. Together they form a unique fingerprint.

Cite this