TY - GEN
T1 - Scaling datacenter accelerators with compute-Reuse architectures
AU - Fuchs, Adi
AU - Wentzlaff, David
N1 - Funding Information:
Memoization: Memoization has been extensively explored in past work and is supported by modern programming environments. To the best of our knowledge, it has not yet explored as a means to trade CMOS-based computations for scalable memory technologies, nor has it applied to entire kernels in accelerated systems.
Funding Information:
We thank the anonymous reviewers for their valuable feedback. This work was partially supported by the NSF under Grants No. CCF-1453112 and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, and DARPA under Grant No. N66001-14-1-4040. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/19
Y1 - 2018/7/19
N2 - Hardware specialization is commonly used in datacenters to ameliorate the nearing end of CMOS technology scaling. While offering superior performance and energy-efficiency returns compared to general-purpose processors, specialized accelerators are bound to the same device technology constraints, and are thus prone to similar limitations in the future. Once technology scaling plateaus, accelerator and application tuning will reach a point of near-optimum, with no clear direction for further improvements. Emerging non-volatile memory (NVM) technologies follow different scaling trends due to different physical properties and manufacturing techniques. NVMs have inspired recent efforts of innovation in computer systems, as they possess appealing qualities such as high capacity and low energy. We present the COmpute-REuse Accelerators (COREx) architecture that shifts computations from the scalability-hindered transistor-based logic towards the continuing-to-scale storage domain. COREx leverages datacenter redundancy by integrating a storage layer together with the accelerator processing layer. The added layer stores the outcomes of previous accelerated computations. The previously computed results are reused in the case of recurring computations, thus eliminating the need to re-compute them. We designed COREx as a combination of an accelerator and specialized storage layer using emerging memory technologies, and evaluated it on a set of datacenter workloads. Our results show that, when integrated with a well-tuned accelerator, COREx achieves an average speedup of 6.4× and average savings of 50% in energy and 68% in energy-delay product. We expect further increase in gains in the future, as memory technologies continue to improve steadily.
AB - Hardware specialization is commonly used in datacenters to ameliorate the nearing end of CMOS technology scaling. While offering superior performance and energy-efficiency returns compared to general-purpose processors, specialized accelerators are bound to the same device technology constraints, and are thus prone to similar limitations in the future. Once technology scaling plateaus, accelerator and application tuning will reach a point of near-optimum, with no clear direction for further improvements. Emerging non-volatile memory (NVM) technologies follow different scaling trends due to different physical properties and manufacturing techniques. NVMs have inspired recent efforts of innovation in computer systems, as they possess appealing qualities such as high capacity and low energy. We present the COmpute-REuse Accelerators (COREx) architecture that shifts computations from the scalability-hindered transistor-based logic towards the continuing-to-scale storage domain. COREx leverages datacenter redundancy by integrating a storage layer together with the accelerator processing layer. The added layer stores the outcomes of previous accelerated computations. The previously computed results are reused in the case of recurring computations, thus eliminating the need to re-compute them. We designed COREx as a combination of an accelerator and specialized storage layer using emerging memory technologies, and evaluated it on a set of datacenter workloads. Our results show that, when integrated with a well-tuned accelerator, COREx achieves an average speedup of 6.4× and average savings of 50% in energy and 68% in energy-delay product. We expect further increase in gains in the future, as memory technologies continue to improve steadily.
KW - Accelerators
KW - Emerging memories
KW - Memoization
UR - http://www.scopus.com/inward/record.url?scp=85055895895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055895895&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2018.00038
DO - 10.1109/ISCA.2018.00038
M3 - Conference contribution
AN - SCOPUS:85055895895
T3 - Proceedings - International Symposium on Computer Architecture
SP - 353
EP - 366
BT - Proceedings - 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture, ISCA 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018
Y2 - 2 June 2018 through 6 June 2018
ER -