TY - GEN
T1 - Architectural Support for Optimizing Huge Page Selection Within the OS
AU - Manocha, Aninda
AU - Yan, Zi
AU - Tureci, Esin
AU - Aragón, Juan L.
AU - Nellans, David
AU - Martonosi, Margaret
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/10/28
Y1 - 2023/10/28
N2 - Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a last-level TLB miss. This small, fixed-size structure tracks huge page-aligned regions (consisting of N base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote of the application footprint to achieve more than of the peak achievable performance, yielding 1.19-1.33 × speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux's page promotion policy by (when 50% of total memory is fragmented) and (when 90% of total memory is fragmented) respectively.
AB - Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a last-level TLB miss. This small, fixed-size structure tracks huge page-aligned regions (consisting of N base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote of the application footprint to achieve more than of the peak achievable performance, yielding 1.19-1.33 × speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux's page promotion policy by (when 50% of total memory is fragmented) and (when 90% of total memory is fragmented) respectively.
KW - cache architectures
KW - graph processing
KW - hardware-software co-design
KW - memory management
KW - operating systems
KW - virtual memory
UR - http://www.scopus.com/inward/record.url?scp=85183459928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183459928&partnerID=8YFLogxK
U2 - 10.1145/3613424.3614296
DO - 10.1145/3613424.3614296
M3 - Conference contribution
AN - SCOPUS:85183459928
T3 - Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
SP - 1213
EP - 1226
BT - Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
PB - Association for Computing Machinery, Inc
T2 - 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
Y2 - 28 October 2023 through 1 November 2023
ER -