TY - JOUR
T1 - A Synthesis methodology for hybrid custom instruction and coprocessor generation for extensible processors
AU - Sun, Fei
AU - Ravi, Srivaths
AU - Raghunathan, Anand
AU - Jha, Niraj K.
N1 - Funding Information:
Manuscript received November 1, 2005; revised September 24, 2006. This work was supported by the National Science Foundation under Grant CCR-0310477. This paper was recommended by Associate Editor R. Camposano.
PY - 2007/11
Y1 - 2007/11
N2 - Systems-on-chip often use hardware accelerators or coprocessors to provide efficient implementations of application-specific functions. The recent emergence of extensible processor cores with supporting design tools has given designers with another viable alternative, namely, the use of application-specific custom instructions. Coprocessors and custom instructions can be viewed as two different forms of hardware acceleration that are applicable at different levels of granularity and offer differing tradeoffs. Classical hardware/software-partitioning techniques and application-specific instruction-set design tools address the individual problems of coprocessor generation and custom-instruction addition. However, given a complex application, it is not clear which design choice (coprocessors or custom instructions or a combination) will result in better performance, area, or power consumption. We demonstrate that a combination of custom instructions and coprocessors is often the best solution in many applications, making the case for a hybrid custom-instruction and coprocessor-synthesis methodology. We propose such a methodology that builds upon the basic observations that coprocessors are usually good for coarse-grained tasks and require minimal intervention or support from the processor, while custom instructions are usually suited to fine-grained operations that are best integrated into a processor pipeline. Our methodology uses a hierarchical task-graph representation in order to support both coarse- and fine-grained views of an application, which are necessary to make meaningful tradeoffs. We propose a hierarchical synthesis algorithm that incorporates multiobjective evolutionary optimization in order to handle different design dimensions, such as area and performance, and provide a wide range of nondominated solutions. We have implemented the proposed methodology in the context of a commercial extensible processor-based platform (Xtensa from Tensilica). Our design flow uses a commercial behavioral-synthesis tool and an existing automatic-custom- instruction-generation tool. Our experiments with several applications show that simultaneous custom-instruction and coprocessor synthesis can achieve significantly better area/performance tradeoffs than using only one of them.
AB - Systems-on-chip often use hardware accelerators or coprocessors to provide efficient implementations of application-specific functions. The recent emergence of extensible processor cores with supporting design tools has given designers with another viable alternative, namely, the use of application-specific custom instructions. Coprocessors and custom instructions can be viewed as two different forms of hardware acceleration that are applicable at different levels of granularity and offer differing tradeoffs. Classical hardware/software-partitioning techniques and application-specific instruction-set design tools address the individual problems of coprocessor generation and custom-instruction addition. However, given a complex application, it is not clear which design choice (coprocessors or custom instructions or a combination) will result in better performance, area, or power consumption. We demonstrate that a combination of custom instructions and coprocessors is often the best solution in many applications, making the case for a hybrid custom-instruction and coprocessor-synthesis methodology. We propose such a methodology that builds upon the basic observations that coprocessors are usually good for coarse-grained tasks and require minimal intervention or support from the processor, while custom instructions are usually suited to fine-grained operations that are best integrated into a processor pipeline. Our methodology uses a hierarchical task-graph representation in order to support both coarse- and fine-grained views of an application, which are necessary to make meaningful tradeoffs. We propose a hierarchical synthesis algorithm that incorporates multiobjective evolutionary optimization in order to handle different design dimensions, such as area and performance, and provide a wide range of nondominated solutions. We have implemented the proposed methodology in the context of a commercial extensible processor-based platform (Xtensa from Tensilica). Our design flow uses a commercial behavioral-synthesis tool and an existing automatic-custom- instruction-generation tool. Our experiments with several applications show that simultaneous custom-instruction and coprocessor synthesis can achieve significantly better area/performance tradeoffs than using only one of them.
KW - Application-specific instruction set processor
KW - Coprocessor
KW - Custom instruction
KW - Extensible processor
UR - http://www.scopus.com/inward/record.url?scp=64149104654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=64149104654&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2007.906457
DO - 10.1109/TCAD.2007.906457
M3 - Article
AN - SCOPUS:64149104654
SN - 0278-0070
VL - 26
SP - 2035
EP - 2045
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
M1 - 4352011
ER -