TY - GEN
T1 - Memory performance optimizations for real-time software HDTV decoding
AU - Chen, Han
AU - Li, Kai
AU - Wei, Bin
N1 - Publisher Copyright:
© 2002 IEEE.
PY - 2002
Y1 - 2002
N2 - This paper shows that the performance bottleneck in software MPEG-2 video decoders has shifted to memory operations, as microprocessor technologies have been improving at a fast rate during the past few years. We exploit concurrencies between the processor and the memory sub-system at macroblock level to alleviate the performance bottleneck. First, the paper introduces an interleaved-block order data layout to improve cache performance. Second, the paper describes an algorithm to explicitly prefetch macroblocks for motion compensation. Finally, the paper presents an algorithm to schedule interleaved decoding and output at macroblock level. Our implementation and experiments show that these methods successfully hide the latency of memory and frame buffer. These techniques improve the performance of an already optimized software MPEG-2 decoder by about a factor of two. On a 933 MHz Pentium III PC, the decoder can play 720p HDTV streams at over 62 frames per second.
AB - This paper shows that the performance bottleneck in software MPEG-2 video decoders has shifted to memory operations, as microprocessor technologies have been improving at a fast rate during the past few years. We exploit concurrencies between the processor and the memory sub-system at macroblock level to alleviate the performance bottleneck. First, the paper introduces an interleaved-block order data layout to improve cache performance. Second, the paper describes an algorithm to explicitly prefetch macroblocks for motion compensation. Finally, the paper presents an algorithm to schedule interleaved decoding and output at macroblock level. Our implementation and experiments show that these methods successfully hide the latency of memory and frame buffer. These techniques improve the performance of an already optimized software MPEG-2 decoder by about a factor of two. On a 933 MHz Pentium III PC, the decoder can play 720p HDTV streams at over 62 frames per second.
UR - http://www.scopus.com/inward/record.url?scp=84908334142&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908334142&partnerID=8YFLogxK
U2 - 10.1109/ICME.2002.1035779
DO - 10.1109/ICME.2002.1035779
M3 - Conference contribution
AN - SCOPUS:84908334142
T3 - Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
SP - 305
EP - 308
BT - Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
Y2 - 26 August 2002 through 29 August 2002
ER -