A scalable synthesis methodology for application-specific processors

Fei Sun, Srivaths Ravi, Anand Raghunathan, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


Custom processors based on application-specific or domain-specific instruction sets are gaining popularity, and are often used to implement critical architectural blocks in complex systems-on-chip. While several advances have been made in the area of custom processor architectures, tools, and design methodologies, designers are still required to manually perform some critical tasks, such as selection of the custom instructions best suited to the given application and design constraints. We present a scalable methodology for the synthesis of a custom processor from an embedded software program. A key feature of the proposed methodology is its scalability, which is achieved by exploiting the structured, hierarchical nature of large software programs. We motivate the need for such a methodology, and describe the algorithms used for the critical steps, including hardware resource budgeting, local optimizations, and global exploration. Our methodology utilizes the concept of "soft" instruction templates, which can be adapted by adding operations to them or deleting operations from them at any time during the design space exploration process, allowing for global design decisions to be interleaved with fine-grained optimizations. To the best of our knowledge, this is the first work that uses the program hierarchy to derive soft instruction templates to synthesize application-specific processors for scalable applications. We have integrated our methodology in an open-source compiler, and verified it using a commercial extensible processor. Experiments with several benchmarks indicate that our methodology can effectively tackle large programs. It results in the synthesis of high-quality custom processors that demonstrate an average speedup of 2.82 × and a maximum speedup of 6.07 ×. As a side-effect, the processor energy is also reduced. The average and maximum reduction in the energy-delay product for the benchmarks are 7.64 × and 18.85 ×, respectively. The CPU times required for custom processor synthesis are quite small, indicating that the proposed techniques can be applied to embedded software programs of significant complexity.

Original languageEnglish (US)
Pages (from-to)1175-1187
Number of pages13
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Issue number11
StatePublished - Nov 2006

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering


  • Application-specific instruction set processors (ASIPs)
  • Custom processors
  • Extensible processors


Dive into the research topics of 'A scalable synthesis methodology for application-specific processors'. Together they form a unique fingerprint.

Cite this