TY - JOUR
T1 - A scalable synthesis methodology for application-specific processors
AU - Sun, Fei
AU - Ravi, Srivaths
AU - Raghunathan, Anand
AU - Jha, Niraj K.
N1 - Funding Information:
Manuscript received December 12, 2003; revised February 23, 2006. This work was supported in part by the New Jersey Commission on Science and Technology Center for Embedded System-on-a-Chip Design and by the National Science Foundation under Grant CCR-0310477. F. Sun is with Tensilica Inc., Santa Clara, CA 95054 USA (e-mail: fsun@ tensilica.com). S. Ravi and A. Raghunathan are with NEC Laboratories America Inc., Princeton, NJ 08540 USA (e-mail: [email protected]; [email protected]). N. K. Jha is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2006.886410
PY - 2006/11
Y1 - 2006/11
N2 - Custom processors based on application-specific or domain-specific instruction sets are gaining popularity, and are often used to implement critical architectural blocks in complex systems-on-chip. While several advances have been made in the area of custom processor architectures, tools, and design methodologies, designers are still required to manually perform some critical tasks, such as selection of the custom instructions best suited to the given application and design constraints. We present a scalable methodology for the synthesis of a custom processor from an embedded software program. A key feature of the proposed methodology is its scalability, which is achieved by exploiting the structured, hierarchical nature of large software programs. We motivate the need for such a methodology, and describe the algorithms used for the critical steps, including hardware resource budgeting, local optimizations, and global exploration. Our methodology utilizes the concept of "soft" instruction templates, which can be adapted by adding operations to them or deleting operations from them at any time during the design space exploration process, allowing for global design decisions to be interleaved with fine-grained optimizations. To the best of our knowledge, this is the first work that uses the program hierarchy to derive soft instruction templates to synthesize application-specific processors for scalable applications. We have integrated our methodology in an open-source compiler, and verified it using a commercial extensible processor. Experiments with several benchmarks indicate that our methodology can effectively tackle large programs. It results in the synthesis of high-quality custom processors that demonstrate an average speedup of 2.82 × and a maximum speedup of 6.07 ×. As a side-effect, the processor energy is also reduced. The average and maximum reduction in the energy-delay product for the benchmarks are 7.64 × and 18.85 ×, respectively. The CPU times required for custom processor synthesis are quite small, indicating that the proposed techniques can be applied to embedded software programs of significant complexity.
AB - Custom processors based on application-specific or domain-specific instruction sets are gaining popularity, and are often used to implement critical architectural blocks in complex systems-on-chip. While several advances have been made in the area of custom processor architectures, tools, and design methodologies, designers are still required to manually perform some critical tasks, such as selection of the custom instructions best suited to the given application and design constraints. We present a scalable methodology for the synthesis of a custom processor from an embedded software program. A key feature of the proposed methodology is its scalability, which is achieved by exploiting the structured, hierarchical nature of large software programs. We motivate the need for such a methodology, and describe the algorithms used for the critical steps, including hardware resource budgeting, local optimizations, and global exploration. Our methodology utilizes the concept of "soft" instruction templates, which can be adapted by adding operations to them or deleting operations from them at any time during the design space exploration process, allowing for global design decisions to be interleaved with fine-grained optimizations. To the best of our knowledge, this is the first work that uses the program hierarchy to derive soft instruction templates to synthesize application-specific processors for scalable applications. We have integrated our methodology in an open-source compiler, and verified it using a commercial extensible processor. Experiments with several benchmarks indicate that our methodology can effectively tackle large programs. It results in the synthesis of high-quality custom processors that demonstrate an average speedup of 2.82 × and a maximum speedup of 6.07 ×. As a side-effect, the processor energy is also reduced. The average and maximum reduction in the energy-delay product for the benchmarks are 7.64 × and 18.85 ×, respectively. The CPU times required for custom processor synthesis are quite small, indicating that the proposed techniques can be applied to embedded software programs of significant complexity.
KW - Application-specific instruction set processors (ASIPs)
KW - Custom processors
KW - Extensible processors
UR - http://www.scopus.com/inward/record.url?scp=33845529542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845529542&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2006.886410
DO - 10.1109/TVLSI.2006.886410
M3 - Article
AN - SCOPUS:33845529542
SN - 1063-8210
VL - 14
SP - 1175
EP - 1187
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 11
ER -