TY - JOUR
T1 - Application-specific heterogeneous multiprocessor synthesis using extensible processors
AU - Sun, Fei
AU - Ravi, Srivaths
AU - Raghunathan, Anand
AU - Jha, Niraj K.
N1 - Funding Information:
Manuscript received October 4, 2004; revised May 7, 2005. This work was supported by the National Science Foundation (NSF) under Grant CCR-0310477. This paper was recommended by Associate Editor R. Camposano. F. Sun was with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA. He is now with Tensilica Inc., Santa Clara, CA 95054 USA (e-mail: [email protected]). S. Ravi and A. Raghunathan are with the Nippon Electric Company (NEC) Laboratories America, Princeton, NJ 08540 USA (e-mail: [email protected]; [email protected]). N. K. Jha is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCAD.2005.858269
PY - 2006/9
Y1 - 2006/9
N2 - The complexity of many embedded software applications and their stringent performance requirements have reached a point where they can no longer be supported by conventional embedded-system architectures based on a single general-purpose processor. Nanometer fabrication technologies have made it feasible to integrate several embedded processors on a single chip to create multiprocessor systems-on-chip (MPSoCs) for a wide range of applications. It is already well known that, for embedded systems, customizing the processor to the application can significantly increase performance and energy efficiency. Single-chip heterogeneous multiprocessors, where the different processors are customized to the tasks they perform, are thus attracting attention. Despite their significant potential, a primary bottleneck to their adoption remains the development of programming paradigms and tools to alleviate the design complexity. Some advances have been made in the areas of custom multiprocessor architecture, tool support, and design methodology in order to reduce the design turnaround time. However, because of the heterogeneity of the custom processors, the designers are still required to manually map the application tasks to the different processors and, at the same time, customize each processor so that the overall performance requirements are satisfied. A multilevel custom multiprocessor-synthesis methodology to perform the assignment and scheduling of the application tasks on the various processors together with processor customization in an integrated manner is proposed. This paper focuses on extensible processors, which provide a good tradeoff between efficiency, flexibility, and design turnaround time, by combining a configurable base processor with hardware that implements custom instructions to extend its instruction set. This paper motivates the need for such an integrated approach by demonstrating that custom-instruction selection has complex interdependencies with task assignment and scheduling, and performing these steps independently often results in significant degradation in the quality of the synthesized multiprocessor architecture. The methodology presented here uses an iterative improvement algorithm to first assign and schedule tasks on processors and then select custom instructions along the critical path. It utilizes the concept of "expected execution time" to connect these two steps. It not only considers the currently selected custom instructions for the current task assignment and schedule, but also the possibility of better custom instructions being selected in future iterations. The methodology presented here is also enhanced to integrate task-level software pipelining to further increase the parallelism in the task graph and provide opportunities for multiprocessing. In this paper, the proposed heterogeneous multiprocessor-synthesis methodology is implemented in the context of a commercial extensible-processor design flow, using the Xtensa platform from Tensilica Inc. The presented tool was evaluated by automatically generating custom multiprocessor architectures for several complex embedded software benchmarks. The results show that architectures synthesized by the proposed methodology demonstrate an average speedup of 2.0 × (up to 2.9×) compared to symmetric multiprocessor architectures in which the processors have not been augmented with custom instructions. To the best of the authors' knowledge, this is the first tool for the synthesis of custom MFSoCs using extensible processors.
AB - The complexity of many embedded software applications and their stringent performance requirements have reached a point where they can no longer be supported by conventional embedded-system architectures based on a single general-purpose processor. Nanometer fabrication technologies have made it feasible to integrate several embedded processors on a single chip to create multiprocessor systems-on-chip (MPSoCs) for a wide range of applications. It is already well known that, for embedded systems, customizing the processor to the application can significantly increase performance and energy efficiency. Single-chip heterogeneous multiprocessors, where the different processors are customized to the tasks they perform, are thus attracting attention. Despite their significant potential, a primary bottleneck to their adoption remains the development of programming paradigms and tools to alleviate the design complexity. Some advances have been made in the areas of custom multiprocessor architecture, tool support, and design methodology in order to reduce the design turnaround time. However, because of the heterogeneity of the custom processors, the designers are still required to manually map the application tasks to the different processors and, at the same time, customize each processor so that the overall performance requirements are satisfied. A multilevel custom multiprocessor-synthesis methodology to perform the assignment and scheduling of the application tasks on the various processors together with processor customization in an integrated manner is proposed. This paper focuses on extensible processors, which provide a good tradeoff between efficiency, flexibility, and design turnaround time, by combining a configurable base processor with hardware that implements custom instructions to extend its instruction set. This paper motivates the need for such an integrated approach by demonstrating that custom-instruction selection has complex interdependencies with task assignment and scheduling, and performing these steps independently often results in significant degradation in the quality of the synthesized multiprocessor architecture. The methodology presented here uses an iterative improvement algorithm to first assign and schedule tasks on processors and then select custom instructions along the critical path. It utilizes the concept of "expected execution time" to connect these two steps. It not only considers the currently selected custom instructions for the current task assignment and schedule, but also the possibility of better custom instructions being selected in future iterations. The methodology presented here is also enhanced to integrate task-level software pipelining to further increase the parallelism in the task graph and provide opportunities for multiprocessing. In this paper, the proposed heterogeneous multiprocessor-synthesis methodology is implemented in the context of a commercial extensible-processor design flow, using the Xtensa platform from Tensilica Inc. The presented tool was evaluated by automatically generating custom multiprocessor architectures for several complex embedded software benchmarks. The results show that architectures synthesized by the proposed methodology demonstrate an average speedup of 2.0 × (up to 2.9×) compared to symmetric multiprocessor architectures in which the processors have not been augmented with custom instructions. To the best of the authors' knowledge, this is the first tool for the synthesis of custom MFSoCs using extensible processors.
KW - Application-specific instruction-set processor (ASIP)
KW - Customization
KW - Extensible processors
KW - Heterogeneous multiprocessors
KW - Multiprocessor system-on-chip (MPSoC)
UR - http://www.scopus.com/inward/record.url?scp=33748100384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33748100384&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2005.858269
DO - 10.1109/TCAD.2005.858269
M3 - Article
AN - SCOPUS:33748100384
SN - 0278-0070
VL - 25
SP - 1589
EP - 1602
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 9
M1 - 1673736
ER -