TY - GEN
T1 - Bundled execution of recurring traces for energy-efficient general purpose processing
AU - Gupta, Shantanu
AU - Feng, Shuguang
AU - Ansari, Amin
AU - Mahlke, Scott
AU - August, David I.
PY - 2011
Y1 - 2011
N2 - Technology scaling has delivered on its promises of increasing device density on a single chip. However, the voltage scaling trend has failed to keep up, introducing tight power constraints on manufactured parts. In such a scenario, there is a need to incorporate energy-efficient processing resources that can enable more computation within the same power budget. Energy efficiency solutions in the past have typically relied on application specific hardware and accelerators. Unfortunately, these approaches do not extend to general purpose applications due to their irregular and diverse code base. Towards this end, we propose BERET, an energy-efficient co-processor that can be configured to benefit a wide range of applications. Our approach identifies recurring instruction sequences as phases of "temporal regularity" in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model. This judicious off-loading of program execution to a reduced-complexity hardware demonstrates significant savings on instruction fetch, decode and register file accesses energy. On average, BERET reduces energy consumption by a factor of 3-4X for the program regions selected across a range of general-purpose and media applications. The average energy savings for the entire application run was 35% over a single-issue in-order processor.
AB - Technology scaling has delivered on its promises of increasing device density on a single chip. However, the voltage scaling trend has failed to keep up, introducing tight power constraints on manufactured parts. In such a scenario, there is a need to incorporate energy-efficient processing resources that can enable more computation within the same power budget. Energy efficiency solutions in the past have typically relied on application specific hardware and accelerators. Unfortunately, these approaches do not extend to general purpose applications due to their irregular and diverse code base. Towards this end, we propose BERET, an energy-efficient co-processor that can be configured to benefit a wide range of applications. Our approach identifies recurring instruction sequences as phases of "temporal regularity" in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model. This judicious off-loading of program execution to a reduced-complexity hardware demonstrates significant savings on instruction fetch, decode and register file accesses energy. On average, BERET reduces energy consumption by a factor of 3-4X for the program regions selected across a range of general-purpose and media applications. The average energy savings for the entire application run was 35% over a single-issue in-order processor.
KW - co-processor
KW - efficiency
KW - energy saving
KW - microarchitecture
UR - http://www.scopus.com/inward/record.url?scp=84863374615&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863374615&partnerID=8YFLogxK
U2 - 10.1145/2155620.2155623
DO - 10.1145/2155620.2155623
M3 - Conference contribution
AN - SCOPUS:84863374615
SN - 9781450310536
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 12
EP - 23
BT - MICRO 44 - Proceedings of the 44th Annual IEEE/ACM Symposium on Microarchitecture
T2 - 44th Annual IEEE/ACM Symposium on Microarchitecture, MICRO 44
Y2 - 4 December 2011 through 7 December 2011
ER -