Abstract
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism for observing and reacting to memory behavior directly. To fill this need, this article proposes a new class of memory operations called informing memory operations, which essentially consist of a memory operation combined (either implicitly or explicitly) with a conditional branch-and-link operation that is taken only if the reference suffers a cache miss. This article describes two different implementations of informing memory operations. One is based on a cache-outcome condition code, and the other is based on low-overhead traps. We find that modern in-order-issue and out-of-order-issue superscalar processors already contain the bulk of the necessary hardware support. We describe how a number of software-based memory optimizations can exploit informing memory operations to enhance performance, and we look at cache coherence with fine-grained access control as a case study. Our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions. We believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations. Categories and Subject Descriptors: B.3.2 [Memory Structures]: Design Styles - cache memories; B.3.3 [Memory Structures]: Performance Analysis and Design Aids - simulation; C.4 [Computer Systems Organization]: Performance of Systems - measurement techniques; D.3.4 [Programming Languages]: Processors - compilers; optimization.
Original language | English (US) |
---|---|
Pages (from-to) | 170-205 |
Number of pages | 36 |
Journal | ACM Transactions on Computer Systems |
Volume | 16 |
Issue number | 2 |
DOIs | |
State | Published - May 1998 |
All Science Journal Classification (ASJC) codes
- General Computer Science
Keywords
- Cache miss notification
- Design
- Experimentation
- Memory latency
- Performance
- Processor architecture