TY - JOUR
T1 - Custom-instruction synthesis for extensible-processor platforms
AU - Sun, Fei
AU - Ravi, Srivaths
AU - Raghunathan, Anand
AU - Jha, Niraj K.
N1 - Funding Information:
Manuscript received December 17, 2002. This work was supported in part by DARPA under Contract DAAB07-00-C-L516 and in part by the NJCST Center on Embedded System-on-a-Chip Design. This paper was recommended by Associate Editor R. Camposano.
PY - 2004/2
Y1 - 2004/2
N2 - Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turn-around times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.
AB - Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turn-around times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.
KW - Application-specific instruction set processors (ASIP)
KW - Application-specific programmable processors
KW - Customization
KW - Extensible processors
KW - Instruction selection
UR - http://www.scopus.com/inward/record.url?scp=1242286078&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1242286078&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2003.822133
DO - 10.1109/TCAD.2003.822133
M3 - Article
AN - SCOPUS:1242286078
SN - 0278-0070
VL - 23
SP - 216
EP - 228
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 2
ER -