As the gap between cycle time and main memory access time increases, memory system performance becomes increasingly important. The trend to higher instruction level parallelism with superscalar processors puts even higher demands on the memory system. Prefetching is a common strategy to tolerate this increased memory latency. This paper presents a software only technique to prefetch data to the CPU cache before it is needed in order combat this problem. The software prefetching technique presented is motivated by emulation of a hardware stride prediction table (SPT). Performance similar, and in some cases superior, to the hardware based technique is achieved with no additional hardware costs. In the first step, a simulation of the hardware SPT is conducted to identify where useful prefetches are best added. In the next step, software prefetches are added to the executable code. The technique is automated and could be implemented by a compiler as a two phase optimization of a profile step followed by an optimization step. Data is presented for both SPEC95 and multimedia benchmarks. In the best case, a performance improvement of 2.78X is observed over the same code with no prefetching at no extra hardware costs.
|Number of pages
|Proceedings of the Hawaii International Conference on System Sciences
|Published - 1998
|Proceedings of the 1998 31st Annual Hawaii International Conference on System Sciences. Part 1 (of 7) - Big Island, HI, USA
Duration: Jan 6 1998 → Jan 9 1998
All Science Journal Classification (ASJC) codes
- General Computer Science