TY - GEN
T1 - SHADOW
T2 - 58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025
AU - Chaturvedi, Ishita
AU - Godala, Bhargav Reddy
AU - Gangavaram, Abiram
AU - Flyer, Daniel
AU - Sorensen, Tyler
AU - Aamodt, Tor M.
AU - August, David I.
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/10/17
Y1 - 2025/10/17
N2 - Many important applications exhibit shifting demands between instruction-level parallelism (ILP) and thread-level parallelism (TLP) due to irregular sparsity and unpredictable memory access patterns. Conventional CPUs optimize for one but fail to balance both, leading to underutilized execution resources and performance bottlenecks. Addressing this challenge requires an architecture that can seemlessly and efficiently adapt to workload variations. This paper presents SHADOW, the first asymmetric SMT core that dynamically balances ILP and TLP by executing out-of-order (OoO) and in-order (InO) threads simultaneously on the same core. SHADOW maximizes CPU utilization by leveraging deep ILP in the OoO thread and high TLP in lightweight InO threads. It is runtime-configurable, allowing applications to optimize the mix of OoO and InO execution. Evaluated on nine diverse benchmarks, SHADOW achieves up to 3.16 × speedup and 1.33 × average improvement over an OoO CPU, with just 1% area and power overhead. By dynamically adapting to workload characteristics, SHADOW outperforms conventional architectures, efficiently accelerating memory-bound workloads without compromising compute-bound performance.
AB - Many important applications exhibit shifting demands between instruction-level parallelism (ILP) and thread-level parallelism (TLP) due to irregular sparsity and unpredictable memory access patterns. Conventional CPUs optimize for one but fail to balance both, leading to underutilized execution resources and performance bottlenecks. Addressing this challenge requires an architecture that can seemlessly and efficiently adapt to workload variations. This paper presents SHADOW, the first asymmetric SMT core that dynamically balances ILP and TLP by executing out-of-order (OoO) and in-order (InO) threads simultaneously on the same core. SHADOW maximizes CPU utilization by leveraging deep ILP in the OoO thread and high TLP in lightweight InO threads. It is runtime-configurable, allowing applications to optimize the mix of OoO and InO execution. Evaluated on nine diverse benchmarks, SHADOW achieves up to 3.16 × speedup and 1.33 × average improvement over an OoO CPU, with just 1% area and power overhead. By dynamically adapting to workload characteristics, SHADOW outperforms conventional architectures, efficiently accelerating memory-bound workloads without compromising compute-bound performance.
KW - Asymmetric CPU microarchitecture
KW - Dynamic ILP-TLP balancing
KW - Heterogeneous thread execution
KW - Instruction-level parallelism (ILP)
KW - Low-overhead microarchitectural design
KW - Memory-bound workload acceleration
KW - Simultaneous multi-threading (SMT)
KW - Software work stealing
KW - Sparse matrix multiplication (SpMM)
KW - Sparse workloads
KW - Thread-level parallelism (TLP)
UR - https://www.scopus.com/pages/publications/105021370562
UR - https://www.scopus.com/pages/publications/105021370562#tab=citedBy
U2 - 10.1145/3725843.3756070
DO - 10.1145/3725843.3756070
M3 - Conference contribution
AN - SCOPUS:105021370562
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 691
EP - 704
BT - MICRO 2025 - 58th IEEE/ACM International Symposium on Microarchitecture
PB - IEEE Computer Society
Y2 - 18 October 2025 through 22 October 2025
ER -