Despite their ubiquity in many important big-data applications, graph analytic kernels continue to challenge modern memory hierarchies due to their frequent, long-latency, pointer indirect accesses to vertex property data. Such accesses exhibit poor locality and variable reuse that trouble cache replacement policies, and consequently increase memory bandwidth pressure. Specialized graph-tailored prefetching mechanisms, processor designs, and memory hierarchy engines have been developed to tolerate the long latencies of such accesses. However, these approaches are either too bandwidth-intensive, require invasive hardware changes that inhibit general-purpose computation flexibility, or rely on software preprocessing that limits true speedup. This work introduces Graphfire, a flexible memory hierarchy approach that learns different access patterns in graph processing and exploits the synergy of specialized fetch, insertion, and replacement optimizations for problematic indirect accesses without relying on software or ISA support. More specifically, Graphfire identifies when these irregular accesses occur and employs tailored access granularities, data-aware insertion, and frequency-based replacement accordingly. It achieves up to a 1.79× speedup (geomean 1.3×) and these improvements scale due to bandwidth efficiency; with 64 cores, Graphfire yields up to a 71.33× speedup (geomean 63.32×) over a single baseline core and allows memory-bound graph analytic codes to scale far beyond prior work.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics
- graph analytics
- memory hierarchy