TY - JOUR
T1 - Decoupling data supply from computation for latency-tolerant communication in heterogeneous architectures
AU - Ham, Tae Jun
AU - Aragón, Juan L.
AU - Martonosi, Margaret Rose
N1 - Funding Information:
We thank the anonymous reviewers for their insightful comments and suggestions. This work was supported in part by C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA. This work was supported in part by the NSF under the grant CCF-1117147. This work was also supported in part by the Spanish State Research Agency under grants TIN2015-66972-C5-3-R and TIN2016-75344-R (AEI/FEDER, EU).
Publisher Copyright:
© 2017 ACM.
PY - 2017/6
Y1 - 2017/6
N2 - In today's computers, heterogeneous processing is used to meet performance targets at manageable power. In adopting increased compute specialization, however, the relative amount of time spent on communication increases. System and software optimizations for communication often come at the costs of increased complexity and reduced portability. The Decoupled Supply-Compute (DeSC) approach offers a way to attack communication latency bottlenecks automatically, while maintaining good portability and low complexity. Our work expands prior Decoupled Access Execute techniques with hardware/software specialization. For a range of workloads, DeSC offers roughly 2× speedup, and additional specialized compression optimizations reduce traffic between decoupled units by 40%.
AB - In today's computers, heterogeneous processing is used to meet performance targets at manageable power. In adopting increased compute specialization, however, the relative amount of time spent on communication increases. System and software optimizations for communication often come at the costs of increased complexity and reduced portability. The Decoupled Supply-Compute (DeSC) approach offers a way to attack communication latency bottlenecks automatically, while maintaining good portability and low complexity. Our work expands prior Decoupled Access Execute techniques with hardware/software specialization. For a range of workloads, DeSC offers roughly 2× speedup, and additional specialized compression optimizations reduce traffic between decoupled units by 40%.
KW - Accelerators
KW - Communication management
KW - Decoupled architecture
UR - http://www.scopus.com/inward/record.url?scp=85027015996&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027015996&partnerID=8YFLogxK
U2 - 10.1145/3075620
DO - 10.1145/3075620
M3 - Article
AN - SCOPUS:85027015996
SN - 1544-3566
VL - 14
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 2
M1 - 16
ER -