TY - GEN
T1 - Augmenting modern superscalar architectures with configurable extended instructions
AU - Zhou, Xianfeng
AU - Martonosi, Margaret Rose
PY - 2000
Y1 - 2000
N2 - The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, however, are limited by a need for generality and for streamlined implementation. The particular needs of one application are balanced against the needs of the full range of applications considered. For this reason, one can "design" a better instruction set when considering only a single application than when considering a general collection of applications. Configurable hardware gives us the opportunity to explore this option. This paper examines the potential for automatically identifying application-specific extended instructions and implementing them in programmable functional units based on configurable hardware. Adding fine-grained reconfigurable hardware to the datapath of an out-of-order issue superscalar processor allows 4-44% speedups on the MediaBench benchmarks [1]. As a key contribution of our work, we present a selective algorithm for choosing extended instructions to minimize reconfiguration costs within loops. Our selective algorithm constrains instruction choices so that significant speedups are achieved with as few as 4 moderately sized programmable functional units, typically containing less than 150 look-up tables each.
AB - The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, however, are limited by a need for generality and for streamlined implementation. The particular needs of one application are balanced against the needs of the full range of applications considered. For this reason, one can "design" a better instruction set when considering only a single application than when considering a general collection of applications. Configurable hardware gives us the opportunity to explore this option. This paper examines the potential for automatically identifying application-specific extended instructions and implementing them in programmable functional units based on configurable hardware. Adding fine-grained reconfigurable hardware to the datapath of an out-of-order issue superscalar processor allows 4-44% speedups on the MediaBench benchmarks [1]. As a key contribution of our work, we present a selective algorithm for choosing extended instructions to minimize reconfiguration costs within loops. Our selective algorithm constrains instruction choices so that significant speedups are achieved with as few as 4 moderately sized programmable functional units, typically containing less than 150 look-up tables each.
UR - http://www.scopus.com/inward/record.url?scp=84876385227&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876385227&partnerID=8YFLogxK
U2 - 10.1007/3-540-45591-4_129
DO - 10.1007/3-540-45591-4_129
M3 - Conference contribution
AN - SCOPUS:84876385227
SN - 354067442X
SN - 9783540674429
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 941
EP - 950
BT - Parallel and Distributed Processing - 15 IPDPS 2000 Workshops, Proceedings
A2 - Rolim, Jose
PB - Springer Verlag
T2 - 15 Workshops Held in Conjunction with the IEEE International Parallel and Distributed Processing Symposium, IPDPS 2000
Y2 - 1 May 2000 through 5 May 2000
ER -