TY - JOUR
T1 - Efficient data supply for parallel heterogeneous architectures
AU - Ham, Tae Jun
AU - Aragón, Juan L.
AU - Martonosi, Margaret Rose
N1 - Funding Information:
This work was supported in part by C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA; by the NSF under grant SHF-1617732; and by the Spanish State Research Agency under grants TIN2015-66972-C5-3-R and TIN2016-75344-R (AEI/FEDER, EU).
Funding Information:
This is a new article, not an extension of a conference paper. This work was supported in part by C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA; by the NSF under grant SHF-1617732; and by the Spanish State Research Agency under grants TIN2015-66972-C5-3-R and TIN2016-75344-R (AEI/FEDER, EU). Authors’ addresses: T. J. Ham, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea; email: taejunham@snu.ac.kr; J. L. Aragón, Facultad de Informática, Campus de Espinardo, University of Murcia, 30100 - Murcia, SPAIN; email: jlaragon@um.es; M. Martonosi, Dept. of Computer Science, Princeton University, 35 Olden St. Princeton, NJ 08540; email: mrm@princeton.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1544-3566/2019/04-ART9 https://doi.org/10.1145/3310332
Publisher Copyright:
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2019/4
Y1 - 2019/4
N2 - Decoupling techniques have been proposed to reduce the amount of memory latency exposed to high-performance accelerators as they fetch data. Although decoupled access-execute (DAE) and more recent decoupled data supply approaches offer promising single-threaded performance improvements, little work has considered how to extend them into parallel scenarios. This article explores the opportunities and challenges of designing parallel, high-performance, resource-efficient decoupled data supply systems. We propose Mercury, a parallel decoupled data supply system that utilizes thread-level parallelism for high-throughput data supply with good portability attributes. Additionally, we introduce some microarchitectural improvements for data supply units to efficiently handle long-latency indirect loads.
AB - Decoupling techniques have been proposed to reduce the amount of memory latency exposed to high-performance accelerators as they fetch data. Although decoupled access-execute (DAE) and more recent decoupled data supply approaches offer promising single-threaded performance improvements, little work has considered how to extend them into parallel scenarios. This article explores the opportunities and challenges of designing parallel, high-performance, resource-efficient decoupled data supply systems. We propose Mercury, a parallel decoupled data supply system that utilizes thread-level parallelism for high-throughput data supply with good portability attributes. Additionally, we introduce some microarchitectural improvements for data supply units to efficiently handle long-latency indirect loads.
KW - Data access optimization
KW - Decoupled architecture
KW - Heterogeneous architecture
UR - http://www.scopus.com/inward/record.url?scp=85065730130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065730130&partnerID=8YFLogxK
U2 - 10.1145/3310332
DO - 10.1145/3310332
M3 - Article
AN - SCOPUS:85065730130
SN - 1544-3566
VL - 16
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 2
M1 - 9
ER -