With the popularity of multimedia acceleration instructions such as MMX, MPEG decompression is increasingly executed on general purpose processors instead of dedicated MPEG hardware. The gap between processor speed and memory access means that a significant amount of time is spent in the memory system. As processors get faster - both in terms of higher clock speeds and increased instruction level parallelism - the time spent in the memory system becomes even more significant. Data prefetching is a well-known technique for improving cache performance. While several studies have examined prefetch strategies for scientific and commercial applications, this paper focuses on video applications. Data is presented for three types of hardware-prefetching schemes: the stream buffer, the stride prediction table (SPT), and the stream cache, as well as a new software-directed prefetching technique based on emulation of the hardware SPT. Up to 90% of the misses that would otherwise occur with no prefetching are eliminated. The stream cache can cut execution time by more than half with the addition of a relatively small amount of additional hardware. Software prefetching achieves nearly equal performance with minimal additional hardware. Techniques presented in this paper can be used to improve performance in a general-purpose CPU or an embedded MPEG processor. Performance gains achieved for MPEG benchmarks apply equally effectively to similar multimedia applications.
|Original language||English (US)|
|Number of pages||15|
|Journal||IEEE Transactions on Circuits and Systems for Video Technology|
|State||Published - Aug 2000|
All Science Journal Classification (ASJC) codes
- Media Technology
- Electrical and Electronic Engineering