Abstract
Decoupling techniques have been proposed to reduce the amount of memory latency exposed to high-performance accelerators as they fetch data. Although decoupled access-execute (DAE) and more recent decoupled data supply approaches offer promising single-threaded performance improvements, little work has considered how to extend them into parallel scenarios. This article explores the opportunities and challenges of designing parallel, high-performance, resource-efficient decoupled data supply systems. We propose Mercury, a parallel decoupled data supply system that utilizes thread-level parallelism for high-throughput data supply with good portability attributes. Additionally, we introduce some microarchitectural improvements for data supply units to efficiently handle long-latency indirect loads.
| Original language | English (US) |
|---|---|
| Article number | 9 |
| Journal | ACM Transactions on Architecture and Code Optimization |
| Volume | 16 |
| Issue number | 2 |
| DOIs | |
| State | Published - Apr 2019 |
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
- Hardware and Architecture
Keywords
- Data access optimization
- Decoupled architecture
- Heterogeneous architecture
Fingerprint
Dive into the research topics of 'Efficient data supply for parallel heterogeneous architectures'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver