Skip to main navigation Skip to search Skip to main content

GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

  • Ishita Chaturvedi
  • , Bhargav Reddy Godala
  • , Yucan Wu
  • , Ziyang Xu
  • , Konstantinos Iliakis
  • , Panagiotis Eleftherios Eleftherakis
  • , Sotirios Xydis
  • , Dimitrios Soudris
  • , Tyler Sorensen
  • , Simone Campanoni
  • , Tor M. Aamodt
  • , David I. August

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Graphics Processing Units (GPUs) use massive multi-threading coupled with static scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge as their latencies vary throughout the application's execution, leading to stalls. Out-of-order (OoO) execution has been shown to effectively mitigate these types of stalls. However, prior OoO proposals involve costly techniques such as reordering loads and stores, register renaming, or two-phase execution, amplifying implementation overhead and consequently creating a substantial barrier to adoption in GPUs. This paper introduces GhOST, a minimal yet effective OoO technique for GPUs. Without expensive components, GhOST can manifest a substantial portion of the instruction reorderings found in an idealized OoO GPU. GhOST leverages the decode stage's existing pool of decoded instructions and the existing issue stage's information about instructions in the pipeline to select instructions for OoO execution with little additional hardware. A comprehensive evaluation of GhOST and the prior state-of-the-art OoO technique across a range of diverse GPU benchmarks yields two surprising insights: (1) Prior works utilized Nvidia's intermediate representation PTX for evaluation; however, the optimized static instruction scheduling of the final binary form negates many purported improvements from OoO execution; and (2) The prior state-of-the-art OoO technique results in an average slowdown across this set of benchmarks. In contrast, GhOST achieves a 3 6% maximum and 6.9 % geometric mean speedup on GPU binaries with only a 0.007 % area increase, surpassing previous techniques without slowing down any of the measured benchmarks.

Original languageEnglish (US)
Title of host publicationProceeding - 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture, ISCA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-16
Number of pages16
ISBN (Electronic)9798350326581
DOIs
StatePublished - 2024
Event51st ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2024 - Buenos Aires, Argentina
Duration: Jun 29 2024Jul 3 2024

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897
ISSN (Electronic)2575-713X

Conference

Conference51st ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2024
Country/TerritoryArgentina
CityBuenos Aires
Period6/29/247/3/24

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Keywords

  • GPU
  • GPU Microarchitecture
  • Parallelism
  • low overhead out-of-order execution
  • out-of-order execution

Fingerprint

Dive into the research topics of 'GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction'. Together they form a unique fingerprint.

Cite this